Collection of Tools & Utilities

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Tools & Utilities / Collection of Tools and Utilities.iso / c / cxt220.zip / SXT.DOC < prev

Wrap

Text File | 1994-03-20 | 184KB | 4,210 lines

SXT (TM) SOFTWARE EXPLORATION TOOLS CXT (TM) C EXPLORATION TOOLS * CFT (TM) C FUNCTION TREE GENERATOR * CST (TM) C STRUCTURE TREE GENERATOR DXT (TM) DBASE EXPLORATION TOOLS * DFT (TM) DBASE FUNCTION TREE GENERATOR FXT (TM) FORTRAN EXPLORATION TOOLS * FFT (TM) FORTRAN FUNCTION TREE GENERATOR LXT (TM) LISP EXPLORATION TOOLS * LFT (TM) LISP FUNCTION TREE GENERATOR Version March 1994 Copyright (C) Juergen Mueller (J.M.) 1988-1994. All rights reserved world-wide. - 1 - DISCLAIMER OF WARRANTY THIS SOFTWARE AND ACCOMPANYING WRITTEN MATERIALS (INCLUDING INSTRUCTIONS FOR USE) IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY OR FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. IN NO EVENT WILL THE AUTHOR AND COPYRIGHT HOLDER BE LIABLE FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR OTHER DIRECT, INDIRECT, GENERAL, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OR INABILITY TO USE THIS PROGRAM (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, BUSINESS INTERRUPTION, LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) AND ON ANY THEORY OF LIABILITY, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. ACKNOWLEDGEMENT BY USING THIS SOFTWARE YOU ACKNOWLEDGE THAT YOU HAVE READ THIS LIMITED WARRANTY AND ACCOMPANYING REMARKS, UNDERSTAND IT, AND AGREE TO BE BOUND BY ITS TERMS AND CONDITIONS. YOU ALSO AGREE THAT THIS IS THE COMPLETE AND EXCLUSIVE STATEMENT OF AGREEMENT BETWEEN THE PARTIES AND SUPERSEDE ALL PROPOSALS OR PRIOR AGREEMENTS, ORAL OR WRITTEN, AND ANY OTHER COMMUNICATIONS BETWEEN THE PARTIES RELATING TO THE SUBJECT MATTER OF THE LIMITED WARRANTY. You are expressly prohibited from selling this software or parts of it in any form, circulate it in any incomplete or modified form, distribute it with another product (except on CD-ROM) or removing this notice. No one may modify or patch any of the executable files in any way, including, but not limited to, decompiling, disassembling or otherwise reverse engineering this software in whole or part. The documentation may be distributed verbatim, but changing is not allowed. The informations and specifications in this document are subject to change without notice. THIS VERSION OF THE DOCUMENTATION, SOFTWARE AND COPYRIGHT SUPERSEDES ALL PREVIOUS VERSIONS. - 2 - This software and documentation is Copyright (C) by Juergen Mueller Aldingerstrasse 22 D-70806 Kornwestheim GERMANY Email address: xmr@isw.uni-stuttgart.de xmr@iswfs2.isw.uni-stuttgart.de There are no relations between the authors professional work and the SXT development. SXT is an independent private project of the author. - 3 - LICENSE This version of the SXT Software Exploration Tools is NOT public domain or free software, but is being distributed as SHAREWARE. Non-registered users of this software are granted a limited license for a 30-day evaluation period starting from the day of the first use to make an evaluation copy for trial use for the express purpose of determining whether this software is suitable for their needs. At the end of this trial period you should either register your copy or discontinue using this software. The use of unregistered copies of this software, outside of the initial 30-day trial, by any person, business, corporation, government agency or any other entity is strictly prohibited. This means that if you use this software, then you should pay for your copy. This software is NOT free, but you have the opportunity to try it before you buy it. Either pay for it, or quit using it. A registration entitles you to use your copy of this software on any and all computers available to you. If other people have access to this software or may use it, then additional copies or a site license should be purchased. All users are granted a limited license to copy this software only for the trial use of others and subject to the above limitations. This license does NOT include distribution, selling or copying of this software package in connection with any other product or service or for distribution in any incomplete or modified form. Operators of electronic bulletin board systems and software servers (like INTERNET FTP-Servers) are encouraged to post this software for downloading by their users, as long as the above conditions are met. This package is expected to be distributed by shareware and freeware channels, but the fees paid for "distribution" costs (e.g. disk, CD-ROM) are strictly exchanged between the distributor and the recipient, and the author makes no express or implied warranties about the quality or integrity of such indirectly acquired copies. Distributors and users may obtain the package directly from the author by following the ordering procedures in the REGISTER files. REGISTRATION REMINDER Unregistered copies of this software are 100% fully functional. I make them this way so that you can have a real look at them, and then decide whether they fit your needs or not. This work depends on your honesty. If you use it, I expect you to pay for it. When you pay for the shareware you like, you are voting with your pocketbook, and will encourage me and others to develop more of these kinds of products. THANK YOU FOR SUPPORTING THE SHAREWARE CONCEPT - 4 - TABLE OF CONTENTS 1 THE SXT SOFTWARE EXPLORATION TOOLS 2 GENERAL INTRODUCTION 3 PROGRAM DESCRIPTION 4 LANGUAGE IMPLEMENTATIONS 4.1 C-LANGUAGE IMPLEMENTATION AND C-PREPROCESSOR 4.2 C++ SOURCE CODE 4.3 DBASE SOURCE CODE 4.4 FORTRAN SOURCE CODE 4.5 LISP SOURCE CODE 4.6 ASSEMBLER SOURCE CODE 5 DATABASE GENERATION 6 PROGRAM LIMITATIONS 7 IMPROVING EXECUTION SPEED 8 COMMAND LINE SYNTAX DESCRIPTION 9 OUTPUT DESCRIPTION AND INTERPRETATION 10 INTEGRATION INTO PROGRAM DEVELOPMENT ENVIRONMENTS 11 TOOLS FOR DATABASE PROCESSING 12 TROUBLE SHOOTING 13 FREQUENTLY ASKED QUESTIONS 14 REFERENCES 15 TRADEMARKS APPENDIX 1: C-PRECOMPILER DEFINES APPENDIX 2: RESERVED C/C++ KEYWORDS APPENDIX 3: EFFICIENCY APPENDIX 4: SYSTEM REQUIREMENTS APPENDIX 5: INSTALLATION - 5 - 1 THE SXT SOFTWARE EXPLORATION TOOLS The SXT Software Exploration Tools are a collection of software analysis tools providing a similar functionality for different programming languages. The following packages are currently available: * CXT - C Exploration Tools: CFT - C Function Tree Generator Tool to analyse and display the function call relationships within the source code of C/C++ programs. CST - C Structure Tree Generator Tool to analyse and display the structure/class relationships within the source code of C/C++ programs. * DXT - DBASE Exploration Tools: DFT - DBASE Function Tree Generator Tool to analyse and display the function call relationships within the source code of DBASE, CLIPPER, FOXBASE and other XBASE-like programs. * FXT - FORTRAN Exploration Tools: FFT - FORTRAN Function Tree Generator Tool to analyse and display the function call relationships within the source code of FORTRAN programs. * LXT - LISP Exploration Tools: LFT - LISP Function Tree Generator Tool to analyse and display the function call relationships within the source code of LISP and SCHEME programs. Each of these packages consists of the analysis program and a recall program ("Navigator") to recall the analysis results which can be stored in a database, plus documentation and additional macros to integrate these tools into popular editors like BRIEF, QEDIT or MicroEMACS. Each of these packages is available for the following systems: * DOS real mode (shareware release) * DOS 386 protected mode (registered users only) * WINDOWS NT text mode (registered users only) * OS/2 text mode (registered users only) * IMPORTANT * IMPORTANT * IMPORTANT * IMPORTANT * IMPORTANT * Although this document is mainly based on the description for the CXT programs CFT and CST (which were up to version 2.13 the only public available SXT programs) and therefore very C/C++ related, the description applies in the same way to all other SXT packages. The names CXT resp. CFT/CST and CFTN/CSTN can be exchanged by the similar other product names. Where necessary, the specific differences of the SXT packages are described. I have done it this way to ensure an overall consistency, to keep all related things together and to reduce the efforts for writing and maintaining this document. - 6 - 2 GENERAL INTRODUCTION The CXT programs are powerful program development, maintenance and documentation tools. They provide the programmer the ability to analyse the source code of applications, no matter how big or complex they are. The CXT programs are also very useful to explore unknown source code and to get complete overview about its internal structure. The re-engineering of old and/or undocumented source code becomes an easy task with these programs. The tools help the programmer to analyse, identify, locate and access all parts of a large software system. They are designed to support software reuse, maintenance and reliability. By preprocessing, scanning and analysing the entire program source code as a single unit, these programs build an internal representation of the function call hierarchy (CFT) and of the data structure relations (CST). The resulting output shows from a global perspective the interdependencies and hierarchical structure between the functions or data types of the whole, multi file, software project. Several features and options allow the user to customise the generated hierarchy tree chart output and to get a large set of useful informations about the source code. The hierarchy structure is always up-to-date because it relies on the original source code as the primary source of information. Written software documentation often differs from that what really has been coded, so the source code itself is the ultimate documentation. An important feature is the database generation. It allows the recalling of informations without reprocessing the source code. The database can again be read in by CFT and CST to produce different outputs or to add new files to the database. Special recall programs called CFTN and CSTN allow fast searching for items in the database. These programs can be used within any environment, for example on the DOS command line or from inside editors like BRIEF, QEDIT or MicroEMACS (DOS and WINDOWS), to provide a full software project management system with access to all functions and data types with just a keystroke. These features make a comfortable "hypertext source code browser and locator" system out of your editor. A project consisting of several files appears to the developer as if it were a 'whole-part' of software. The developer can walk through programs and trace the logic without having to memorize the directories and files where functions or data types are defined and called. Displaying and printing a graphical representation of the analysis results as a call graph is not supported bye the SXT programs but owners of RATIONAL ROSE, a powerful software development case tool supporting the Booch Object-Oriented Analysis and Design (OOAD) method, can use this tool for such purposes. The SXT programs can generate compatible output which can be imported by Rational Rose. See option -RATIONAL for a detailed description. - 7 - Listings of all functions/data types and source files can be written as formatted ASCII text files and can be used as input for other programs like word processors or spreadsheet calculators. A useful option of CST is the possibility to generate a source file with which size and byte offset calculations for structures/unions and their members can be performed. This option is useful especially to support any kind of error searching or hardware debugging, for example with an ICE, or if data structures have to be exchanged between different hardware platforms. CFT can also be used to analyse "C"-like languages as they are used by several commercial programs. The macro programming languages of the BRIEF, EPSILON and ME editors are such languages and can be handled by CFT. The resulting output files can be used for various purposes like development or documentation. There are no restriction limits in using them for your own work. CFT and CST have been used and tested since 1989 in several projects with applications ranging from single source files over medium sized projects (like CFT, CST and the other SXT tools themselves) up to very large software projects with hundreds of source and include files (mixed C and assembler code), more than 6 MB of source code, more than 200000 lines, 2000 functions and 500 data types. A lot of public available C/C++ sources (e.g. GNU-C compiler, GNU-C library, GNU-EMACS, MicroEMACS, NCSA TCP/IP communication software package, SUIT - The Simple User Interface Toolkit, NIHCL - The National Institute of Health C++ class library, F2C Fortran-to-C translator, several projects from Dr. Dobbs Journal (DFLAT and BOB), Microsoft sample code (MFC 1.0 and 2.0)) were processed (with sometimes surprising results!) during the development and have been used to test and improve the features, reliability, correctness, robustness and execution speed of CFT, CST and their related utilities. Although the other SXT packages are much newer than CFT and CST, they all are closely related. The CXT tools were used as the base for all other packages. - 8 - 3 PROGRAM DESCRIPTION CFT builds a hierarchy tree chart of every function with the called functions in it's own function block. These functions are again used as a starting point for subsequent function blocks. Starting the tree chart with the "main"-function it will display the complete function flow chart and the function hierarchy dependency of the whole application with all user defined functions and the called library functions. Prototyped but never defined or called functions are also detected. Recursive calls of functions are recognised and displayed, even over several call levels. Repeated calls of previously displayed functions in the output tree chart are detected and a message will be given with a reference to their first appearance. This prevents the output of complete subtrees displayed earlier. Overloaded C++ functions and operators are recognised and displayed with the number of overloadings. CST acts similar to CFT but it works on data types like basic types, structures, unions, enumerations and C++ classes. CST builds a hierarchy tree chart of every structure and union data type with their internal elements and their related data types. If these data types are again structures, unions or classes, the substructures will again be displayed. CST recognises data types defined by 'typedef' and derived from other data types. The type names corresponding to the same basic type are displayed in the output file as 'alias' names for their common basic data type name. Every feature of CFT like the detection of recursive declared structures and unions, references to previously displayed data types and others are available and act similar. Every function (CFT) and data type (CST) can be displayed with the name of the source file and the line number where it is defined. The output can be customised to display the tree chart as a call-tree ("CALLER-CALLEE"-relation: "WHO CALLS WHOM") or as a caller-tree ("CALLEE-CALLER"-relation: "WHO IS CALLED BY WHOM"). This feature allows the user to determine which functions are called from a specific function or which functions are callers of a specific function. The function and data type extraction from the source code is done by scanning and parsing the source. There is absolutely no need for the programmer to mark functions or data types of interest, for example with special keywords, starting the definitions at the beginning of a line or to use comments containing special marks, as it is necessary for other source code analysers and browsers. CFT and CST do not need these work-arounds, any source code can be processed without previous work. These tools are also compiler independent because they can be customised to support any kind of compiler. - 9 - Several useful informations and software metrics about the processed source code and the included files can be generated like - file size and comment size in bytes for every file, - number of source code lines for every file, - number of included files for every source file, - total effective number of scanned bytes and lines for every source file and its included files, if files are included multiple times, this will influence the calculations, - for every defined function the number of lines, the code and comment size in bytes, the number of bytes per line, the number of functions called, the number of flow control statements (if, else, for, while, case, default, goto, return, exit), the maximum brace nesting level and if the function is used only inside the file, - for every defined structure/union the total number of elements and the number of elements which are themselves structures/unions, - file function or data type reference list for every file, - total number of displayed, defined, undefined or multiple defined functions and data types, - location of all multiple defined functions and data types, - location of all overloaded C++ functions, - source file - include file dependencies for every source file, - final statistical summary for all files, - cross reference of every occurrence for every function or data type, - parent/children relationship for every function and data type, - critical function call path/structure nesting with deepest non-recursive nesting level (unlimited tree depth), - C++ class inheritance graph, and much more ... The resulting hierarchy structure chart is another representation for a directed call graph. A directed call graph consists of nodes (functions or data types) and connections (call relations) between these nodes. The number of nodes and connections which are necessary to transform the hierarchy structure chart into a directed call graph will also be calculated as an additional information about the system complexity. A large number of options to control the program execution and the output generation are available and can be defined on the command line, by command files or by defining them in an environment variable used by the program. CFT and CST can be directly invoked from inside editors or integrated development environments like the Borland C++ IDE. Detailed examples for the integration together with necessary macro or batch files are given. - 10 - 3 LANGUAGE IMPLEMENTATIONS 3.1 C-LANGUAGE IMPLEMENTATION AND C-PREPROCESSOR The ISO/ANSI C language standard ISO/IEC 9899:1990 (E) resp. X3.159-1989-ANSI C as described in several books about the C-language (see references) was used as a development base. The reserved keywords being recognised are not only the original ISO/ANSI C keywords but were also taken from several compiler implementations like Microsoft, Borland or GNU and their own special language extensions. The books "The C++ Programming Language" and "The Annotated C++ Reference Manual" (ARM) together with informations about the work of the ANSI C++ committee X3J16 resp. the ISO/IEC working group SC22 WG21 were used for the C++ keywords. Another major source was the AT&T C++ release 2.1. Compiler specific extensions especially from GNU are also recognised. Proposed extensions to C++ like additional keywords (e.g. wchar_t) and the so called 'digraphs' will be supported if they are introduced into the C++ language standard. A complete list of all reserved keywords is show in appendix 2. The large set of keywords may lead to some slight problems in situations where a keyword is not used as itself but as an identifier name, for example a C++ keyword used as an identifier in C. During a normal file scan, precompiler defines are, if possible, handled as if a real precompiler would be present, but this can cause some trouble with '#if', '#ifdef' and other precompiler controls which are not evaluated. Also the block nesting level, which will be monitored by the source code scanner, may not be at level 0 at the end of the file because of such precompiler controls. To avoid such things, a built-in C-preprocessor allows the complete preprocessing of the source code and include files for several compiler types as an additional option (-P). Preprocessing or not is a little bit controversial because it can either result in a loss of information if macros are used to change the program behaviour and hide function calls, it can lead to errors during file scanning or it can change the function and data type informations obtained from the code which may not exactly correspond to the visible source code. Preprocessing can be an advantage or not, so the user has to decide whether he does it or not. The preprocessor handles the defines for Microsoft C 5.1, Microsoft C/C++ 7.0, Microsoft VC++ 1.0 for Windows NT (Beta Release June 1993), Turbo C++ 1.0, Borland C++ 2.0, Borland C++ 3.1, GNU-C and Intel 80960 C compiler iC960 3.0 and all memory models (not necessary for GNU-C and I960) or CPU architectures for the Intel 80960 32 bit RISC processor (KA, KB, SA, SB, MC, CA). Other compiler types can be customised with the -B and the -D options. The default ISO/ANSI C predefined macros '__FILE__', '__LINE__', '__DATE__', '__TIME__' are generated for preprocessing. The macro '__STDC__' is NOT defined (some compilers test with '#ifndef __STDC__'), so that non standard - 11 - ISO/ANSI C extensions in the processed code are allowed. Defining '-D__STDC__=1' forces ISO/ANSI C conforming output (if used by the scanned source code, of course!). Additional supported precompiler defines are '__TIMESTAMP__', '__BASE_FILE__' and '__INCLUDE_LEVEL__'. A list of the predefined preprocessor defines for the supported compiler types is shown in appendix 1. Features like the replacing of trigraphs and the recognition of C++ comments '//...' are also treated by the preprocessor. The precompiler recognises several errors or possible sources for problems like - the use of undefined variables in precompiler controls, - misbalanced '#if...' control block(s) including the exact location (file, line) where the failing block started, - recursive called include files, - wrong number of macro arguments (missing ones or too many) and displays diagnostic messages with an exact description of the error or warning reason and its location in the source file. 3.2 C++ SOURCE CODE Although CFT and CST were initially not developed to process C++ code it is possible to do so. In that case, however, some restrictions and limitations should be considered. The recognition of C++ classes by CST is limited because the handling of the internal class structure items (variables and functions) is too complex to fit in the CST program. So classes are only referenced by name but their internal structure will not be scanned and displayed. The C++ class inheritance relationships are recognised and shown in a class hierarchy graph listing (option -b). Structures in C++ with function names as structure members will not be processed correctly. Templates are not supported and will not be recognised. Calling member functions will not be recognised correctly due to missing class name, this leads also to an incomplete call tree. The use of overloaded functions with equal names but different parameters in C++ programs may lead to incorrect calling relationships. A variable initialization with parameters will be misinterpreted as a function call. A correct handling of these and other C++ features requires a complete C++ source code analyser to keep track of the class functions belong to and the different calling parameters. If precise informations about C++ code are needed, utilities like 'class hierarchy browsers' or 'class viewers', which are usually (or should be) part of C++ compiler environments, should be used instead. Because of the above described reasons, some care should be taken if C++ code is processed and displayed. - 12 - 3.3 DBASE SOURCE CODE DFT can process source code which is based on the DBASE III/IV programming language. This means that also source code written in DBASE derivatives like CLIPPER or FOXBASE can be analysed. The source code analyser tries to be as correct as possible to build a reliable hierarchy tree. A function/procedure declaration is recognised by the FUNCTION resp. PROCEDURE keyword. A function/procedure call is recognised by the following statements: function() CALL function CALL function WITH parameters DO function DO function WITH parameters If a file contains no function/procedure declaration, the filename itself is taken as procedure name. All tokens are assumed case-insensitive and are converted to upper-case characters. 3.4 FORTRAN SOURCE CODE FFT can process source which is based on the FORTRAN 77 standard. Each FORTRAN line is divided into fields for the required information, each column represents a single character. COLUMN FIELD 1 comment indicator (C,c,*,!) 1-5 label 6 indicator for line continuation 7-72 statement field (optionally up to column 132) Continuation lines are merged before they are analysed. The number of continuation lines is 19 by default and can be varied between 0 and 99 (option -qn). The standard intrinsic functions and additionally VAX-FORTRAN intrinsic functions are recognised. All tokens are assumed case-insensitive and are converted to upper-case characters. If option -I is set, INCLUDE statements are recognised and handled. Two different types of include statements are accepted: C TYPE 1: FORTRAN LIKE SYNTAX, INCLUDE STATEMENT STARTS IN C COLUMN 7, FILENAME IN SINGLE QUOTATION MARKS INCLUDE 'FILENAME' C TYPE 2: C LIKE SYNTAX, INCLUDE STATEMENT STARTS IN C COLUMN 1 WITH #, FILENAME IN DOUBLE QUOTATION MARKS #INCLUDE "FILENAME" - 13 - The resulting function call graph may be incorrect due to the ENTRY capability of FORTRAN which allows direct jumps into a function/subroutine body. This may result in incorrect relationships for the ENTRY statement and the surrounding function/subroutine. 3.5 LISP SOURCE CODE LFT can process LISP and SCHEME source code. The development of LFT was mainly based on the GNU-EMACS LISP dialect as it is used in the GNU-EMACS macro extension language and its functionality was tested mainly with these macro files. LISP functions/macros are recognised by the DEFUN and DEFMACRO keywords. SCHEME functions are recognised by the DEFINE keyword, SCHEME processing is enabled by option -XSCHEME. Unnamed functions declared with the LAMBDA keyword can be recognised optionally (option -XLAMBDA). Tokens are assumed case-sensitive. Comments are recognised for ';' until end-of-line and between '#|' and '|#' as multi line comment blocks. The source code analysis is performed in two passes. The first pass detects function/macro declarations and the second pass analyses the relationships. Function calls via (funcall <fcn>), (function <fcn>), (apply <fcn>), (mapc <fcn>) and similar constructs may not be correctly evaluated if fcn is a function-symbol (e.g. given as a function parameter) and not a valid function name. LFT was designed to work with different types of LISP source code (as there are XLISP, CLOS, GNU-EMACS LISP, ...), although the large number of dialects may lead sometimes to unexpected problems. 3.6 ASSEMBLER SOURCE CODE As an additional feature, CFT and FFT can process assembler source code for the Intel 80x86 processors (MASM 5.1, TASM) and for the Intel 80960 RISC processors (or any other "AT&T UNIX-like assembler" like GNU) to get information about assembler procedures and functions being called from the assembler source files. The assembler source code scanner also detects and handles calls of include files. This feature is useful for mixed language programming. The processing of assembler macros, however, is not supported, the preprocessing option (-P) works only with C source code. Assembler source files are recognised by their file extensions '.ASM' and '.S', there is no other way to force a file being processed as an assembler file. The following naming convention is used: For '.ASM' assembler files (MASM, TASM) all identifiers are treated case-insensitive and will be transformed to lower case characters, but identifiers in '.S' (GNU, I960) assembler files are treated case-sensitive. This means, that an assembler function 'func1' defined in an '.ASM' file can be called from the source by 'func1', 'FUNC1', 'Func1' or any other lower and upper case character combination. If 'func1' is defined in an '.S' file, the name must match - 14 - exactly. The first leading underscore of a function name will be removed to get exact naming matches. Type modifiers in C source code like 'cdecl' or 'pascal' will not be considered. Remember these conventions when processing C/FORTRAN and assembler files. Assembler code statements (inline code) inside C source code will not be processed and will be skipped, because it is too difficult to handle the several kinds of syntax being used for this like 'asm ...', 'asm "..."' or 'asm(...)' and the different keywords ('asm', '_asm', '__asm', '__asm__', ...) used by various compiler implementations. - 15 - 5 DATABASE GENERATION One of the most important features provided by CFT and CST is the database generation which can be enabled with the -G option. It is performed after writing the output file to save all informations about the processed files in a set of dBASE compatible database files (extension '.DBF') for later use. These database files contain all necessary informations like function or data type names, the location where they are defined, their caller/callee relationship, all scanned files with statistic informations, include files and so on. It was tried to store the informations in the most compact and effective database structure to save disk space. Note that if the contents of the database files is manipulated by external tools like dBASE or something else, the internal consistency will be corrupted and wrong or unexpected results will happen! The database can be used to recall informations, for example to find out, if and in which file and on which line a specific function or data type is defined. A previously generated database can be read into CFT and CST (option -g) to add new files to it and/or to produce another output file with new configuration options, for example with the reverse call tree or only with a special selected item of interest to be displayed. Such an incremental database generation is also useful if large projects can be divided into a set of commonly used files and project specific files. A good example for this is the GNU C compiler, which consists of a set of language independent files and three language dependent file sets for C, C++ and Objective-C. To analyse this software with CFT or CST, the language independent part can be stored into a database which is later reused for the language dependent parts to build the complete set of informations. The ability to retrieve informations about the sources from the database is quite useful in many cases. Recalling informations from a database is much faster than processing all the sources again to find a specific item of interest. The documentation and maintenance of large software projects is much more effective and easier to do if the developer has a tool to navigate through the source code and that helps him in his comprehension of the program and its internal structure. It is also useful for reverse engineering of source code to get an overview of the internal program structure. Together with user programmable editors it is possible to offer the user a source code browser with a hypertext like feeling by integrating database recalling functions into the editors. Two utility programs, called CFTN and CSTN to, retrieve informations from databases, are available with supporting macros for their integration into the BRIEF, QEDIT or MicroEMACS editor, which are described in another section later in this manual. - 16 - 6 PROGRAM LIMITATIONS First of all, CFT and CST cannot replace a compiler or a syntax checker like 'LINT' to detect errors in the source code. This means that it should be possible to compile the source code without fatal errors before it is possible to analyse it with CFT and CST, otherwise the processing results may be incorrect (and may be the system crashes ...). However, there are some situations where CFT and CST can be useful to detect bugs and inconsistencies in the source code like - multiple definitions of functions or data types, - different function return types, - implicit declared functions with no prototype, - function definitions used as prototype, - recursive, nested, hidden and frequent calls of include files, - unclosed strings or character constants, - nested comments, - misbalanced braces, - unexpected end-of-file characters inside files, - illegal characters in the source code, - wrong number of macro arguments, - missing macro arguments, - misbalanced '#if...' control blocks. These code checks are done on multiple files in multiple directories so that inconsistencies between different files can be found and displayed. This is a capability which conventional compilers working only on a single file at a time cannot provide and will miss therefore (maybe the linker will find some of these inconsistencies). Some statistical informations about the source code may not be correct if preprocessing is enabled (-P). This affects all options which do statistics like the -p or -s option. The size of the 'pure' source code may not be correct due to macro expansion or removing of unnecessary blanks. However, the file size is always correct because it will be taken from the source file. Most of the program limitations are caused by the limited available memory. This means that the more conventional main memory you have, the better it is. The real mode versions of CFT and CST do not use expanded or extended memory, no virtual memory management or disk file swapping, so keep your conventional memory free of memory consuming TSR programs and other utilities if you want to process a large number of files. The use of operating systems like MS-DOS 5.0 or DR-DOS 6.0 and memory managers like QEMM or 386MAX to get more free conventional memory may help to handle big applications with a large number of files. If memory problems still occur during processing, there is an easy way to break the memory limits: use the 32 bit protected mode versions of CFT and CST, called CFT386 and CST386. These programs are running in protected mode and so they have no memory limitations and are faster than the real mode versions. - 17 - The number and the sizes of files to be processed is nearly unlimited with 2^14 files and 2^31 bytes maximum file length. Each file can have 2^16 lines. The number of functions and data types being handled is limited to 2^14. Note that these values are given for the real mode versions, the protected mode versions exceed them. These limitations should be enough even for the biggest projects that could be mentioned. The calling of nested include files is limited by the number of files which can be opened simultaneously (operating system resp. compiler dependent). The ISO/ANSI C minimum for include file nesting levels is 8, this demand will be fulfilled by CFT and CST. The integrated C-preprocessor limits the size of expanded macros to 6 Kbytes. The number of macros simultaneously defined is unlimited (ISO/ANSI: 1024) and only affected by the available memory. The number of macro parameters is limited to 31 (ISO/ANSI: 31) and there are up to 31 significant characters (ISO/ANSI: 31) recognised. The conditional compilation nesting levels of '#if...' control blocks is limited to 32 (ISO/ANSI: 8). The line length is unlimited (ISO/ANSI: logical line length is 509 characters). The number of characters in a string (including '\0') is 2048 (ISO/ANSI: 509). The number of members in one structure/union is unlimited (ISO/ANSI: 127), the number of structure/union nesting levels is unlimited (ISO/ANSI: 15). The recognition of identifiers like function and variable names follows the standard rules: an identifier consists of upper and lower case letters (A-Z, a-z), underscore (_) and digits (0-9), additionally the dollar sign ($) will be accepted. National character set extensions as they are usual for languages in european countries like Germany, Denmark or Sweden can be defined with option -J. C++ comments '//...' are usually only recognised if option -C++ is set. However, to accept the non-standard extension of some compilers which allow such comments also in C source code, option -// can be used therefore. Nested C style comments '/*...*/' are not allowed and will always produce warnings. The calculation depth of the critical function call path or structure nesting level is unlimited. The calculation is an extremely recursive function and was successfully tested up to 115 nesting levels. It is not known from which nesting level on stack overflow will happen. CFT cannot recognise and reference a function if it is used with its pure name, without parentheses. This happens if a function name is assigned to a function pointer variable or used as a function pointer argument in a function call. Indirect calls to a function via a function pointer cannot be resolved. CFT will be confused in some rare cases by extensive type-casting operations like 'void __based(void) * __cdecl ... ()' and will display - 18 - unexpected messages. A function prototype declaration inside a function block ('function given scope') will not be recognised by CFT. In assembler source code, some definitions of local variables seem to look like a function or a label definition and are treated by CFT like that although this may be wrong in some cases. It is also not always possible to detect a call of a local label correctly. CFT sometimes displays warning messages about 'return type mismatch' though this may be correct in that special case because the different types are earlier defined by a 'typedef' declaration. The reason is simply that CFT doesn't recognise these 'typedef's (but CST does!), it looks only for function names. An often requested feature for CST is the integration of the calculation of structure/union sizes with byte offset informations for every structure/union member. This feature is not implemented in CST although it would be possible to do this because all necessary informations are present. The reason is that there would be too much overhead for CST to treat the various compiler implementations with their different basic type sizes (sizeof(int), sizeof(long double)) for different processor types (16 bit, 32 bit, 64 bit, ...) and data type alignment requirements (by default and also controlled with #pragma's like 'align' or 'pack'). It would be possible to do this for just one selected compiler implementation or processor type but not for a great number of them. Especially compilers for advanced architectures like RISC processors have very complicated type alignments rules depending on the data types, alignment pragmas, compiler switches, type sizes, available register number and register sizes and resulting structure/union/class sizes to generate highly optimised code. This includes usually the insertion of 'fill' bytes inside a structure/union and sometimes 'padding bytes' at the end of a structure/union to force aligned sizes on specific byte boundaries (For examples see the reference manual of the Intel 80960 C-Compiler iC960, release 3.0). Because of these reasons, an integrated 'byte offset calculation' is not implemented in CST. Instead, you can generate a source file for selected data types with option -O, that performs these calculations, if you compile the generated file with your C compiler. For further informations see the description for option -O. SUMMARY The above described limitations can lead in some situations to misinterpretations or loss of informations of the scanned source code. The only way to avoid these lacks would be the inclusion of parts of a 'real compiler' to handle the complete C and C++ syntax in any possible situation. But this was not the intention when the development of these programs as 'little' and easy to use general purpose programming supporting tools began. Although I hope that CFT, CST and the other SXT programs will in most cases be powerful and useful development and documentation tools! - 19 - 7 IMPROVING EXECUTION SPEED CFT and CST are disk storage based programs because the source and include files, the intermediate precompiler file and the output file must be read from and written to hard disk. This means that the execution speed of CFT and CST depends at first on the speed of the physical storage medium and not (only) on the speed of the CPU. There are several ways to improve the program performance: - install a RAM-disk and a) start CFT and CST from there so that the intermediate file and the resulting output file will be stored there (but don't forget to copy the output file to the hard disk before power-off), or b) use the -v option to redirect only the precompiler output file (scanner input file) to the RAM-disk from anywhere the program is started (the RAM-disk must be large enough to hold the largest possible temporary file, otherwise a disk-write error will occur), - use a hard disk cache program like SmartDrive, HyperDisk or PC-Cache, - use a faster hard disk, - and finally, of course, use a faster and more powerful CPU. The most effective combination is option -v with a RAM-disk as destination path and hard disk caching together with a fast hard disk drive. If the disk cache is large enough to hold most of the frequently called include files, the execution speed is about 2.5 to 3 times faster than without. This is a significant speed-up especially for projects with a large number of files and many included files in each source file. During program execution with preprocessing (option -P), most of the time will be consumed to preprocess the given input files and the related include files and to generate the preprocessor output file. The scanning for functions (CFT) or data types (CST) takes only a small amount of time. The function/data type relations are computed while the output is generated and written to disk, there is no precomputing necessary. The function for critical call path/nesting level detection depends only on the number of functions or structures and not on the call/declaration nesting complexity. The execution time grows linear with the number of items (functions/structures) to process and is very fast! Be aware of the fact that the processing of a large number of files can take quite a long time (from several minutes up to hours on lower performance machines!), especially if option -P for preprocessing is enabled. The generation of the output file and writing to disk can also take some time if the number of items to display is large and the nesting structure is complex or if there is no cross reference option enabled (see -x and -r for further information). If the - 20 - number of items is very large, one of the most time consuming options is the function/data type file reference (option -z). The writing and reading of the database files (options -G and -g) takes also some time due to the large number of different informations. Don't panic if there seems to be no disk access for a longer time, the reason is just that there may be time consuming computations and that the output will be buffered internally to reduce the number of disk accesses and therefore speed up the output! For more detailed informations about the program efficiency see appendix 3. - 21 - 8 COMMAND LINE SYNTAX DESCRIPTION The SXT programs are command-line driven. This section gives a complete overview about all command line options and their syntax. It gives also remarks for their use and shows several examples with detailed descriptions. The command line options are case-sensitive! There are no differences between the real mode and the other versions of the SXT programs. For every option the SXT programs which support it are listed in parentheses. This section of the documentation should be read very careful by all users to get a complete overview about all the features which are provided. THE OPTIONS ARE LISTED IN LEXICOGRAPHICAL ORDER. NONE OF THE OPTIONS IS SET BY DEFAULT. SYNTAX: CFT [options [$cmdfile]] <[+]file> <@filelist> CST [options [$cmdfile]] <[+]file> <@filelist> DFT [options [$cmdfile]] <[+]file> <@filelist> FFT [options [$cmdfile]] <[+]file> <@filelist> LFT [options [$cmdfile]] <[+]file> <@filelist> OPTIONS: (valid for) -Bsizes (CFT, CST) Redefine the basic type sizes and pointer type sizes (all values must be declared in bytes) for conditional preprocessor controls with the 'sizeof()' keyword like '#if sizeof(int) == 4'. This option is only valid with the -P option. The required format for this option is -Bv,c,s,i,l,f,d,ld*data,code | (delimiter between data and pointer sizes is '*') with the following types and their respective default data size values in bytes (the pointer type sizes are model dependent): v : void (sizeof(void) is usually 0, but for GNU-C it is 1) c : char (1 byte) s : short (by definition 2 bytes, hardware independent) i : integer (hardware dependent, 2 or 4 bytes) l : long (4 bytes) f : float (4 bytes, IEEE format) d : double (8 bytes, IEEE format) ld : long double (10 bytes, IEEE format, some compilers assume long double == double (= 8 bytes), some CPU's and their compilers have special alignment requirements like the Intel 80960, where sizeof(long double) is 16 bytes due to register and memory access requirements and structure alignment) data : data pointer (type pointers, 2 or 4 bytes, memory model dependent) - 22 - code : code pointer (function pointers, 2 or 4 bytes, memory model dependent) The sizes of signed and unsigned types of the same basic types are considered equal, this means that, for example, the following expression is true: sizeof(unsigned int) == sizeof(signed int) == sizeof(int) The sizes of type pointers to data and function pointers to code are also considered equal, this means that, for example, the following expressions are true: sizeof(int *) == sizeof(float *) sizeof(int (*)()) == sizeof(float (*)()) A 64 bit (8 bytes) integer type like 'long long' or 'bigint' (or something else) is not supported because there are no C compilers known to me which use such a type although some (co-)processors and their assemblers are able to handle it (see Intel 80960 assembler manual for examples). If the -B option is not set, the default values for the various memory models and compiler types (as they are known to me) are used, the assumed target hardware has an Intel 80x86 microprocessor. Note that during preprocessing type modificators like "near" or "far" are not recognised. If the -B and the -T options are not set, the sizes of data pointers and code pointers are always considered equal: sizeof(int *) == sizeof(int (*)()) (= 4, large model) For example, -B0,1,2,2,4,4,8,10*4,4 would be the correct declaration for MS-C 7.0, large/huge memory model, with the values for data types (void = 0, char = 1, short = 2, int = 2, long = 4, float = 4, double = 8 and long double = 10 bytes) and pointers to data types and function pointers (all values 4 bytes). These values are set automatically by defining -TMSC70,L (or -TMSC70,H) as compiler type and memory model description for preprocessing. -C++ (CFT, CST) Enable C++ source code processing. This includes the handling of C++ comments '//...', the recognition of C++ keywords and the definition of the macro name '__cplusplus' for preprocessing. If a supported compiler defines additional macro names like '__TCPLUSPLUS__' for Turbo-C they will also be defined before preprocessing. Option -C++ is strictly required to process C++ code correct. -C[s] (CFT, CST, DFT, FFT, LFT) List the function/data type contents for every processed file, 's' sorts by line numbers (DEFAULT ORDER: lexicographical). There are additional informations possible with the option -s. CFT informs if none of the functions defined in a file is called from - 23 - functions defined in other files (internal versus external linkage). Functions for which no external caller outside the file is found will be marked [INTERNAL], such functions are candidates for defining them as 'static'. Attention: Calling a function by a function pointer won't be noticed! This information is useful to find out whether the contents of a file is unnecessary for the project so that the file must not be linked. This option gives useful informations about source code metrics for every defined function. -D[..] (CFT, CST, DFT, FFT, LFT) Specifies macro name(s) (-Dname or -Dname1=name2) or file with macro names (-D@namelist) of functions/data types which should be predefined and linked together, also used as preprocessor define if the integrated preprocessor is called (-P). The defined names are case sensitive and trigraph translation is performed on them. The definition of a string as replacement for a macro name is different on the command line and inside a macro definition file or command file (marked with '$'). On the command line, the double quotation marks must be 'escaped' and the string must be quoted like '-DXYZ="\"123\""' (similar to C strings) to work correctly, the reason is the DOS wildcard expansion of the command line. Inside a macro definition or command file, the double quotation marks need not be 'escaped', so the definition can be written like '-DXYZ="123"'. This option cannot be used in environment defines if the equal sign '=' is used because this produces a syntax error for DOS when trying to store a 'SET=...' command with a second equal sign in one line. If a define item consists of two words see the notes at option -S for a description. Keep these differences and exceptions in mind to avoid unexpected results using the -D option. -Ename (CFT, CST, FFT) Almost the same as -I, but the path for the include files will be taken from the environment variable 'name'. Typing -EINCLUDE would produce the same results as -I alone. -E[..] (LFT) Specifies name(s) (-Ename) or file with names (-E@namelist) of external (builtin) functions. Useful if GNU-Emacs Lisp source code is scanned to reduce the number of undefined functions listed in the output file. A list of GNU-EMACS (version 18.59) builtin functions is given with the file GNULISP.FCT. -F (CFT, CST, DFT, FFT, LFT) Use only ASCII characters for the tree chart output instead of the DEFAULT semigraphic characters. This option is useful if the generated output file should be printed on a printer which does not support semigraphic characters like they are defined in the IBM character set. It can also be used to prepare the output file for use in a WINDOWS application like MicroEMACS if there is no font with semigraphics available. - 24 - -G[name] (CFT, CST, DFT, FFT, LFT) Generate a database with the complete set of informations about the processed sources. The additional parameter 'name' (path and file name) is used as an unique base name for the set of database files (up to 6 significant characters), the DEFAULT NAME 'CXT' is used if no name is specified. If 'name' ends with a (back-)slash, it is used as a pathname. The generated database files (extension '.DBF') are dBASE compatible. There are two additional files created, one with the command line options (extension '.CMD') and one with a list of the source files (extension '.SRC') being use for database generation. They can be used as command line definition files with '$' (command list) and '@' (file list). As a result of the database generation you will find files named 'CXTxy.ext' (default name 'CXT') respectively 'namexy.ext' (user defined 'name'), where 'x' will be 'F' for CFT or 'S' for CST and 'y' is replaced by an internally used character to mark the different database files and their contents. -H[elp] (CFT, CST, DFT, FFT, LFT) See option -?. -I[path] (CFT, CST, FFT) This option enables the scanning of include files declared with '#include "..."' or '#include <...>' or with a similar syntax for FORTRAN. The required path for the include files is taken from the INCLUDE environment variable (DEFAULT BEHAVIOUR) or can be user defined by 'path'. Paths defined with -I will be searched before any other paths taken from environment variables specified by -E or -P, so care should be taken with that option. Include paths can be given either absolute or relative. A relative path is always considered relative to the directory of the source file it is used with, not to the directory the analysis is started from or the analysis program is located. Specifying -I* ignores missing include files during preprocessing (-P). This is a 'quick and dirty' approach, but can sometimes be useful, if include interrelations or locations are unknown. However, the results may not always be correct. Using the -I or -E option without -P allows the scanning of the source file and the included files without preprocessing. In that case an include file is handled as if it were a complete new file, this can lead to errors if a file inclusion is specified within a function or structure. Also preprocessor controls like '#if ...' are not evaluated and can lead to unexpected results. -Jcharset (CFT, CST, DFT, FFT, LFT) Extend the C/C++ character set (a-z, A-Z, 0-9, _, $), which is used by DEFAULT, for identifier recognition with a user defined character set 'charset'. This option allows the programmer to use national character sets as they are common in Germany, Denmark, Sweden and other european countries. All characters must be specified within one -J option. - 25 - -L[L][+] (CFT, CST, DFT, FFT, LFT) Redirect the screen output to a file, called 'CFT.LOG' resp. 'CST.LOG'. If '+' is set, the output is both written to screen and redirected to the log file so that the output messages can both be viewed as they appear and later analysed. Finally, -LL resp. -LL+ appends the output to an existing file, this can be useful if CFT and CST run in batch jobs. -M (CFT, CST, FFT) This option generates a source file/include file dependency table for every processed file. This table shows the dependent include files of a source file and can be used for a MAKE file. It is also useful to check if the included files are taken from the correct directories. If a file is included more than once, the number of inclusions will be displayed. -N (CFT, CST, DFT, FFT, LFT) Disable the writing of an output file. This option can be useful if, for example, only a database (option -G) should be generated with CFT or CST and no output file is required. In that case the sometimes very time consuming process of output file writing is skipped. Note that for CST the writing of the byte offset file "CST_OFFS.C" will not be affected by this option. -O[..] (CST) Specifies name(s) (-Oname) or file with names (-O@namelist) of data types for which the calculation of structure/union sizes with byte offset informations for every data type member should be performed. Additionally specifying -O+ sets a flag for the recursive collection of sub-structures during expansion which are displayed without specifying them by -O. This means that if a structure/union consists of members which are also structures or unions (and so on), it is not necessary to specify all these data type names with -O to enable them for byte offset calculation. Instead, you have to specify only the top most data type with -Oname and additionally -O+ to force CST to select all related sub-types for displaying. If -O+ is set but NO names are specified, ALL structures and unions will be used for byte offset calculations! As the result of this option, CST generates a C source file, called 'CST_OFFS.C'. This file needs some additional editing to declare necessary include files, data types, defines or pragmas before it can be compiled with the C compiler for which the file was generated (be sure to use the same includes!). The resulting executable prints for every structure/union member the byte offset relative to the beginning of the structure/union (decimal and hexadecimal) and the size of each member, the resulting structure/union size and also informations whether a structure/union member has been aligned (= compiler dependent insertion of fill bytes before that member) or if the structure/union was padded with fill bytes at the end of it to align the size to a specific length. - 26 - To get these informations and to perform the necessary calculations therefore, the source file 'CST_OFFS.C' can become very large and makes use of the C macro programming capabilities, which may lead in some rare cases to errors during the compilation due to the internal limitations of some C compilers. The -O option is very useful if you need detailed informations about structures/unions in case of error searching and debugging, especially for hardware debugging with an ICE. It is also useful for finding out the differences in the internal layout of a structure/union in the case of porting C source code between different compilers and/or operating systems or if data structures are exchanged between different hardware platforms, for example with data communication. You can verify if the expected structure/union layout and size is really produced by the target compiler. -P[name] (CFT, CST) Run the integrated C preprocessor before the file scan. In this case the include path is taken from the INCLUDE environment variable (DEFAULT BEHAVIOUR), from the user defined 'name' environment and additional paths from -I and -E option are used. If special paths should be searched before the default paths, they must be specified by the -I path or the -E environment option and they must be placed on the command line before the -P option to be processed first. The -D, -U preprocessor defines and -T type and memory model and -B size infos are also used, if defined. The path for the preprocessor output file can be specified by the -v option, otherwise the current working directory will be used (DEFAULT BEHAVIOUR). The comments in the source and included files will remain until -q is defined to remove them. The comments are used for statistics with option -p. If option -C++ is set, the macro '__cplusplus' will be predefined before preprocessing to enable C++ macros and C++ comment recognition. If you are using a compiler which is not supported by CFT and CST or the build-in preprocessing doesn't satisfy your needs because the results seem to be different from your preprocessor, you can preprocess the files you want to analyse with your own compiler preprocessor and use these preprocessed files as input for CFT and CST. -R (CFT, CST, DFT, FFT, LFT) By default, CFT and CST generate the hierarchy tree chart of the called function/data type ("CALLER:CALLEE relation", "WHO CALLES WHOM"). The -R option produces an inverted listing showing the callers/users of each function/data type. It generates the output as the function/data type hierarchy member list tree chart in reverse order as a list of calling items of the referenced basic item ("CALLEE:CALLER relation", "WHO IS CALLED BY WHOM"). This option is useful to get the relations between functions/data types and their callers/users. - 27 - -RATIONAL (CFT, CST, DFT, FFT, LFT) This option generates a so called 'Petal' file for Rational Rose 2.0 for MS-Windows 3.1, a CASE-tool supporting the Booch Object-Oriented Analysis and Design (OOAD) method. The generated output file can be imported by Rational Rose to use the builtin capabilities for describing and visualizing Finite State Machines (FSM), but in this case (mis-)used to graphically visualize the calling relationships of functions resp. data types. If you have Rational Rose 2.0, you have to perform the following steps to get impressive results: Start Rational Rose and select a new model ('File' - 'New') and import the generated file ('File' - 'Import...'). If successful, a class diagram with one class symbol named 'CallGraph' appears. Click on that symbol and choose 'Browse' - 'State Diagram'. In the state diagram select 'Tools' - 'Layout' to start the layout optimization function. As the result the graphical call tree of the source code analysis is displayed with each function/data type shown as a circle ('state') and the call relationship shown as an arrow ('transaction') from the calling to the called item, for classes from the superclass to the subclass. You can zoom into the diagram, print the results or incorporate the diagrams into your program documentation via Clipboard, e.g. into MS-Word-for-Windows. This option is available for all SXT programs. The generated files are named 'CFT.PTL', 'CST.PTL', 'DFT.PTL' and so on. CST generates an additional file named 'CSTCLASS.PTL' describing the class inheritance relationships. The -RATIONAL option is a work-around for the missing graphical layout capabilities of the SXT programs (which some users have requested in the past) by using an external program for doing the missing features. This option was tested with Rational Rose 2.0 Beta for MS-Windows 3.1. Note that Rational Rose needs even for small and medium sized projects some time to import the file and process the FSM layout. -S[..] (CFT, CST, DFT, FFT, LFT) Specify name (-Sname) or file with names (-S@namelist) of functions/data types to search for and to dump if present, names are case sensitive. These items are listed first in the output tree chart file. By using -S on the command line, it is necessary to surround a data type name that consists of two words with double quotation marks like "struct _iobuf" to connect the two words. This is not necessary inside a list file, but there every search name must be on a separate line. -Tn (FFT) Set the tabulator expansion size to 'n' (DEFAULT: 8 characters). -Ttype,m (CFT, CST) Use this option to set the compiler type for source code preprocessing to one of the following types: MSC51 Microsoft C 5.1 MSC70 Microsoft C/C++ 7.0 MSVCWNT Microsoft VC++ 1.0 for Windows NT TC10 Borland Turbo C++ 1.0 BC20 Borland C++ 2.0 - 28 - BC31 Borland C++ 3.1 BC10OS2 Borland C++ 1.0 for OS/2 GNU GNU-C I960 Intel 80960 iC960 3.0 The supported memory models are T(iny) (valid only for MSC70, TC10, BC20, BC31), S(mall), M(edium), C(ompact), L(arge), H(uge), 'L' is assumed as default if no model is specified. MS VC++ for Windows NT, Borland C++ for OS/2, GNU-C and Intel iC960 do not need a memory model because they compile really 32 bit code. The Intel iC960 compiler requires the definition of the 80960 RISC processor architecture which is one of KA, KB, SA, SB, MC, CA (default is KB). This option causes several compiler dependent preprocessor macros (if they were known to me, however) to be defined before preprocessing starts. This option can only be used with the -P option, otherwise it has no effect. If your compiler is not supported, you can perform the following steps: Find out which preprocessor defines are necessary (manual, help file) and declare them with option -D, then declare, depending on the selected memory model or processor architecture, the type sizes with option -B. -U[..] (CFT, CST) Specifies a predefined macro name (-Dname) or file with predefined macro names (-U@namelist) to be undefined for preprocessing. Note that the default predefined macro names '__FILE__', '__LINE__', '__DATE__', '__TIME__' cannot be undefined. All other predefined names for the various compiler types can be undefined. Like for -D, the names are considered case-sensitive, but trigraph translation is not performed because the internal representation cannot contain trigraphs. -V (CFT) List prototyped functions which are neither called nor defined (option -a and -u). This option is useful to find unused function prototypes which could be removed from the source code. -Wlevel (CFT, CST, DFT, FFT, LFT) Set error and warning message level. Higher warning levels include lower ones. The DEFAULT level is always the highest supported warning level: Possible levels are: 0 : all error and warning messages are suppressed except absolutely catastrophic fatal errors, 1 : display serious errors or warnings, 2 : includes level 1 plus additional errors and warnings, 3 : includes level 2 plus errors/warnings/remarks, 4 : includes level 3 plus warnings about implicit declared functions and lacks of type or storage class. - 29 - The following levels affect only preprocessing (CFT and CST): 5 : includes level 4 plus warnings and errors during preprocessing (non-fatal errors and warnings during preprocessing are otherwise not displayed, preprocessor is running in 'silent mode'), 6 : includes level 5 plus remarks/slight warnings during preprocessing. The output format for messages during file scan is file name(line): error: description file name(line): warning: description and during preprocessing (warning levels 5 and 6) preprocessor: file name(line): error: description source line preprocessor: file name(line): warning: description source line -X (CFT, CST, DFT, FFT, LFT) Assume a UNIX-style text file: no CR, only LF. The DEFAULT ASSUMPTION is a DOS-style text file with CR+LF. Any other combination like CR in UNIX-files, CR without following LF or LF without preceding CR in DOS-files will cause a warning message. This option is useful to detect possible conversion errors between different operating systems or incorrect editor configuration settings. -XLAMBDA (LFT) Recognize the LISP resp. SCHEME keyword 'lambda' for unnamed function declarations. By DEFAULT, 'lambda' is treated as a simple identifier. -XSCHEME (LFT) Assume SCHEME source code instead of LISP source code (DEFAULT). This means that functions are recognised by the 'define' SCHEME keyword instead of the 'defun' resp. 'defmacro' LISP keywords. -Y (CFT, CST, DFT, FFT, LFT) Ignore CR+LF checks. This option disables all checks which are done for unexpected CR+LF combinations in DOS or UNIX files. If option -Y is set, option -X will be ignored. This option can be useful if there would be too many messages concerning that error or if this message would be of no interest for the user. -Z[s] (CFT, CST, DFT, FFT, LFT) Display every caller and member for each function/data type, 's' sorts by the number of calls (DEFAULT ORDER: lexicographical), this is an extension of the -c option. This option shows the relations in the following form: List of parent functions/data types: 1. caller (reference #) <# of calls from> ... - 30 - n. caller ... function/data type (reference #) <# of calls from parents, # of calls to children> List of child functions/data types: 1. called member (reference #) <# of calls to> ... m. called member ... This compact form lists all callers and members with the number of their calls, recursions are detected and displayed. -a (CFT, CST, DFT, FFT, LFT) List every function/data type, also previously referenced functions/data types. This generates a complete list of every function/data type in lexicographical order with references to their first location. -b (CST) Display the C++ class inheritance relationships. This option generates two listings. The first one displays the complete C++ class hierarchy graph(s). The second one shows for each class first the superclasses from which the class inherits and the access restrictions (public, protected, virtual, ...) and second the subclasses which inherit from the given class, also with access restrictions. This option is useful to find out things like the class dependencies or multiple inheritance. -cmdline (CFT, CST, DFT, FFT, LFT) Print the command line options at the beginning of the output file as a remark for the generation rules of that output file. The contents of commandlist and filelist files is indented after the listfile name. -c[s] (CFT, CST, DFT, FFT, LFT) Display the number of calls to each function/data type, 's' sorts by the number of calls (DEFAULT ORDER: lexicographical). Useful to find out which functions/data types are never called/used (maybe unnecessary and deletable) and which ones are the most frequently called/used (together with profiler results a subject for further optimization efforts). -dn (CFT, CST, DFT, FFT, LFT) Set the maximum function/structure/union nesting level for output generation to 'n' (DEFAULT: maximum value n = 999). This means that the request for displaying a deeper level will be rejected and the output tree chart will be truncated at the given level. -e[char] (CFT, CST, DFT, FFT, LFT) Generate formatted ASCII text files with function/data type list and file list. All entries are separated by the optional 'char' character, if 'char ' is not defined, the tabulator character is used as DEFAULT SEAPRATOR. If spaces are wanted as separating characters, you have to write -e" ". Such prepared files can be used directly as input to other programs like word processors - 31 - (e.g. MS-WORD for WINDOWS) or spreadsheet calculators (e.g. MS-EXCEL), for example for documentation purposes. The following files are created: CFTITEMS.TXT: Contents: function name, return type, file name, line #, total # of function bytes, # of function comment bytes, # of function lines, # of control statements, # of brace levels CSTITEMS.TXT: Contents: data type name, file name line # CFTFILES.TXT and CSTFILES.TXT: Contents: file name, # of lines, file size in bytes, # of comment bytes, # of functions/data types -f (CFT, CST, DFT, FFT, LFT) Generate an output list in short form, only with the function/data type names, no further description of the internal function/data type elements. -g[name] (CFT, CST, DFT, FFT, LFT) Read a previously generated database (see option -G). The additional parameter 'name' (path and file name) is used as an unique base name for the set of database files (up to 6 significant characters), the DEFAULT NAME 'CXT' is used if no name is specified. If 'name' ends with a (back-)slash, it is used as a pathname. Every source file will be tested for changes of file creation time and file size and a warning message will be given to inform the user. -h[elp] (CFT, CST, DFT, FFT, LFT) See option -?. -iname (CFT, DFT, FFT, LFT) Ignore function member 'name' in output tree chart. It will not be displayed and will be skipped instead if found as a function member. This option can be useful if, for example, functions are used only for test purposes and are of no further interest for the user and should be ignored in the output tree chart. -l (CFT, DFT, FFT, LFT) List a function only once in case of repeated consecutive calls (DEFAULT: display every occurence). If a function is called more than one time inside a function without any other call in between, there will be only one reference of that function call in the output tree chart. This option results in shorter output files. -mtype (CST) Start the data type tree chart with data type 'type' (-mtype). If -m+ is specified, the output starts with the topmost data type, this is the data type which is in the highest level of the hierarchy tree chart. The default output is in lexicographical - 32 - order of the displayed data types. Useful if a selected structure/union should be displayed at the beginning of the output file. -m[name] (CFT) -mname (DFT, FFT, LFT) Start the function tree chart dump with function 'main' (-m) or 'name' (-mname), name is case sensitive. If -m+ is specified, the output starts with the topmost function, this is the function which is in the highest level of the hierarchy tree chart. If this option is not set, the default is lexicographical order of the displayed functions. Usually, the complete function tree chart should start with the 'main' function so that every subfunction is a (sub-)member of 'main'. This option is useful for windows programs to start the output with the initial 'WinMain' function (-mWinMain) instead of 'main'. It can also be used to start the output with the initial assembler start-up code being executed before the 'main'-function is called. -n[a] (CFT, CST, DFT, FFT, LFT) Display the most critical function call path respectively display the data structure/union with the maximum nesting level. The modificator 'a' is used to display every function/structure with its users/callers (DEFAULT: display only deepest call path). This option helps to determine the complexity of the function call/data structure hierarchy and finds recursions over several call/nesting levels. Note that for functions the maximum call path being displayed is the result of the static source code analysis. During program execution the call path can be even deeper if functions are called indirectly with function pointers. -ofile (CFT, CST, DFT, FFT, LFT) Write the generated analysis results to file 'file'. DEFAULT BEHAVIOUR: The file names are 'CFT.LST' for CFT/CFT386 and 'CST.LST' for CST/CST386. Possible overwriting of an existing output file with the same name other than the default one will be detected and prompted for user reconfirming. The resulting output file is an ASCII text file with no formatting characters which can be printed with every printer, viewed and/or edited with every text editor and taken as input to word processors, for example for documentation purposes. -p (CFT, CST, DFT, FFT, LFT) Calculate the program code/file size ratio for every file and make a final summary. This option gives a short overview about the 'real' file contents versus complexity. The computed value is in the range from 0.000 (only comment, no code) to 1.000 (only code, no comment). Used together with -P, the results may not be absolutely correct because of the macro expanding and removing of parts of the source code by '#if...' control blocks. If preprocessing -P is enabled, comment byte count in included files will not be performed. If option -q is set, -p will not calculate values related with comments. - 33 - -q (CFT, CST) Remove comments from preprocessed files, default is don't remove. This option is only valid with option -P, it also affects the -p option because counting comments is not possible and calculations on them cannot be done. -qn (FFT) Set the number of continuation lines to 'n' (DEFAULT: 19 lines). The number must be in the range from 0 to 99. -r (CFT, CST, DFT, FFT, LFT) This is almost the same as option -x, but an additional file reference with the file name and the line number of the declaration will be given (includes -x). The -r or the -x option is STRICTLY RECOMMENDED and should be used as a default option, because without it, every function/data type will be completely redisplayed, including the underlying subtree of functions or data types, whenever it occurs in the output tree chart and so the resulting output file will grow immense, up to several megabytes, if there is enough disk space to write the output file. -s (CFT, CST, DFT, FFT, LFT) Used with -C, this option gives additional informations. For CFT for every function: the number of lines for the function body, the maximum brace levels, the number of bytes for the function body and the number of comment bytes inside the function body. The average values for every source file are computed and displayed. For CST for every data type: number of type elements, number of subelements (nested structures/unions). -time (CFT, CST, DFT, FFT, LFT) Print runtime informations about the times consumed for source analysis, preprocessing, output dump, database reading and writing and for other miscellaneous jobs plus the total time. The results are given in the format MINUTE:SECOND.MILLISECOND. -u (CFT, DFT, FFT, LFT) List undefined functions. These functions are probably library functions, defined in other files which have not been scanned or are unresolved externals found by the linker. -vpath (CFT, CST, FFT) Set a specific path for the intermediate precompiler output file. This option is useful to speed up execution speed when the intermediate file can be stored on a RAM-disk so that file access to the precompiled file is much faster than on a hard disk. Environment variables like 'TMP' or 'TEMP' to set the path for temporary files are not evaluated. - 34 - -x (CFT, CST, DFT, FFT, LFT) Cross reference in case of multiple use. Every function and data type will be given a unique reference number which will furthermore be used as an identifying reference number for the function or data type if it is again displayed. See also option -r for further descriptions. -y (CFT, CST, DFT, FFT, LFT) Display cross link list of files which contain referencing and referenced functions/data types of functions/data types of a specific file. This option shows the relations in the following form: 1. referencing file ... n. referencing file file 1. referenced file ... m. referenced file This option is useful if you want to find out the file relationsships. This information can be used to isolate specific files from a project, e.g. library files. It is also useful if you want to separate a function and want to know which other files are needed because they contain called functions. -z (CFT, CST, DFT, FFT, LFT) Generate a function/data type call cross reference table. For every function/data type the location of its definition (file, line) and a complete list of its calls/references, sorted by files and line numbers is given in the following form: 1. function/data type (reference #) [file #], line # [file #]: line #, ... ... 2. ... ... The functions/data types are displayed in lexicographical order. At the end of the section is the cross reference file list. -// (CFT, CST) Accept C++ comments '//...' in C source code. This option can be used to ensure compatibility with C compilers which can also recognize C++ comments within C source code (like Microsoft and Borland). -? (CFT, CST, DFT, FFT, LFT) Shows the command line syntax and gives a short, but complete help information about the accepted commands and their syntax. - 35 - COMMAND LINE FILES cmdfile (CFT, CST, DFT, FFT, LFT) Specifies a file with (additional) command line options. This might be useful if the command line would be too long because of the number of options and files declared or if you are usually using the same options which can then be stored in a command file. The initial '$'-character is required to mark a command file. filelist (CFT, CST, DFT, FFT, LFT) A file with a list of source file(s) to be processed, wildcards are accepted. The list file should have every file on a single line. The rules for files containing assembler code and path translation are described above. The initial '@'-character is required to mark a filelist file. The '+' sign for subdirectory processing is also possible inside the filelist file. [+]file (CFT, CST, DFT, FFT, LFT) The name of a source file to be processed. More than one file can be specified on the command line. The default assumption for the given files is that they contain C source code. Assembler source files are only recognised by the file extension '.ASM' (80x86 MASM/TASM) and '.S' (Intel 80960, GNU). The '+' sign indicates that, starting from the given directory, all subdirectories should be searched recursively for the given file name search pattern. This addition is useful if a large software project is divided into several modules with separate subdirectories for each module. In that case only the starting (root-)directory with the requested file name search pattern must be specified to search the current directory and all subdirectories. If the file name or the include file specification inside a file contains a relative path ('./', '.\', '../' or '..\') it will be translated into an absolute path starting from the current working directory respectively in case of include files depending on the path of the parent file. Command line wildcards '*' and '?' are possible and will be accepted. REMARKS ON USING OPTIONS NONE OF THE ABOVE DESCRIBED OPTIONS IS PREDEFINED SO IT'S UP TO THE USER HIMSELF TO CUSTOMIZE HIS PREFERRED PROCESSING BEHAVIOUR AND OUTPUT STYLE BY ADDING CONTROL OPTIONS NEEDED THEREFORE. This assumption seems to be the best way to give the users the freedom of making their own decisions about the features they really need for doing their work. However, some of the above described options should be regarded and used as 'DEFAULT' options to generate a readable, complete and useful output file without unexpected side effects. So the minimum default command lines look like - 36 - CFT -m -ra <files> CST -ra <files> Both command sets generate a complete listing containing all items with file name and line reference and a cross reference id for repeated use (options -ra). The option -m for CFT forces the output to start with the 'main' function (if found). The precompile option -P is not strictly necessary though for exact results it should also be set together with the -T option. The standard default command line might be CFT -m -rauspMP -T<type> -cs -Cs -na -Zs -G <file[s]> CST -rapMP -T<type> -cs -Cs -na -Zs -G <file[s]> If you start using CFT and CST for your own business, take these options as a basic set and try other options to get a feeling for what they are useful and how they affect the output. The large number of options may be confusing for beginners but this is the only way to give the users the flexibility of customising their own output. Therefore, take some time to learn about CFT and CST and their features, read this manual carefully and make your own experiences with this software. It is possible to declare more than one source file, command file and list file on the command line. In that case they will be processed in the order they appear. Files and options can be placed in mixed order on the command line, there is no recommended order for them because all options (also those inside command files!) will be processed before any source files are scanned. The maximum command line length for DOS is 127 characters, so this is a system dependent 'natural' limit for the options and file names being declared. If you have more items to declare, place them into command list files and file list files, which do not have such limitations. Options can also be defined by the environment variables CFT and CST (also used for CFT386 and CST386) like SET CFT=... SET CST=... To separate single options in the environment string, spaces are required. See also the description for the -D option for remarks on environment variable definitions. The rules for the interpretation of options is 1. if defined, all options in the environment variables CFT (for CFT and CFT386) or CST (for CST and CST386) will be taken, 2. the command line options and the option files will be interpreted in the order they appear. - 37 - If an option is declared different more than once then previous declarations will be overwritten by the newer one. If options are represented by a single character with no additional optional values possible like -r or -a, they can be grouped together with a single leading '-' in front like '-rasM', which is the same as '-r -a -s -M'. The last option however, can have additions, for example '-rasMmWinMain' which can be evaluated to '-r -a -s -M -mWinMain'. If an option can have an additional parameter, the parameter must be specified without a space between the option character. Leaving this space means that no additional parameter is given for this option. File names being composed of drive letter, directory name, file name and file extension, in the following referred simply as 'path name', are treated by some special procedures to force a unique style of their internal representation: - path names are always considered not case sensitive, so there is no difference in upper case, lower case and mixed case path names (the reason is that DOS does not make any difference), - path names containing './', '.\', '../' and '..\' (so called 'relative paths') are expanded and transformed into absolute paths, - the recommended directory delimiter is '/' (UNIX-style), if a '\' (DOS-style) is recognised in a path name, it will be replaced by '/', - path names are always expanded and transformed into the default style <DRIVE LETTER>:<DIRECTORY PATH>/<FILE NAME> to get a unique representation for every file name that must be handled during processing, - file names have a DOS like maximum length of 12 characters: '<8 characters name>.<3 characters extension>', this is also true for the Windows NT and OS/2 versions of the SXT programs. These actions are done with every path name during file processing. File names given on the command line are also transformed. If you want to perform database generation (option -G) for different projects, you are responsible to separate them and avoid overwriting of existing databases. This can be done either by giving the databases different names so that the database files can be placed all in the same directory, or every database must be written into its own directory. If you want to access the databases be sure to use the correct name and/or path, also within the BRIEF or MicroEMACS editors. - 38 - COMMAND LINE EXAMPLES 1. CFT -m -rau *.c This program invocation of CFT processes all files with the extension ".c" in the current directory and generates an output file starting with the "main"-function (option -m) for the output tree. Every function will be displayed with file and line number reference and a cross reference number (option -r). All functions will be shown in lexicographical order (-a), also undefined ones (-u). 2. CFT -mWinMain -rausMP -TMSC70,L -Id: -cs -Cs -na -ve: -C++ *.c ..\*.c *.cpp This invocation is similar to the one described above with some extensions. The source files from the current (*.c, *.cpp) and from the parent (..\*.c) directory, they will be preprocessed (-P) with MS-C 7.0 defines for large memory model (-TMSC70,L), the include file path will be taken from the environment variable "INCLUDE" (default for -P) and the path "d:" (-Id:) will also be searched for. The precompiler output is stored in path "e:" (-ve:). C++ extensions and keywords will be recognised if they occur (-C++). The output will start with the "WinMain"-function (-mWinMain). There will be a sorted call statistic (-cs) and a function summary for every scanned file (-Cs) with additional informations for every function (-s). The critical function call path for all functions will be calculated and displayed (-na) and the included files of every source file will be shown (-M). 3. CST -S"struct _test" -r *.h -W2 -C++ Start CST to scan all files in the current directory with extension ".h" for data types. They will be displayed with file name and line number reference and cross reference number (-r). The output should be done for the data type 'struct _test' (-S"struct _test"). The warning level is set to "2" (-W2). 4. CFT y.c -R -Dmain=main_entry z.c -P x.c Start CFT to produce a reverse calling tree (-R) of the functions found in the files "x.c", "y.c" and "z.c" in the current directory. The files will be preprocessed (-P) before file scan, the name "main" will be replaced by "main_entry" during preprocessing (-Dmain=main_entry). 5. CST $cst1.cmd $cst2.cmd -ve\tmp: @cstfiles +*.h -olist.v1a This invocation of CST receives its options from the command files "cst1.cmd" and "cst2.cmd" and stores the preprocessor output in path "e:\tmp" (-ve:\tmp). The files being processed are defined in the source list file "cstfiles" and on the command line by "+*.h". The "+*.h" file specification searches the current directory and all subdirectories for files with the extension ".h". The output file will be named "list.v1a" (-olistv1a). - 39 - 6. CFT -ra -PGNUINC -TGNU -M c:\gnu\src\*.c c:\gnu\src\*.s -d10 CFT scans all files with extension ".c" and ".s" in the directory "c:\gnu\src". They will be preprocessed with an include file path defined in environment variable "GNUINC" (-PGNUINC) for compiler type "GNU" (-TGNU). The output contains all functions (-a) with complete reference information (-r) and a list of all included files for every source file (-M). The output tree will be truncated if the nesting level is higher than 10 (-d10). 7. CST *.c CST processes all files with extension ".c" in the current working directory. There are no options specified, so only the options set by the environment variable 'CST', if present, will be used to customise the program execution. As an example the command line options used in example 6. can be defined as environment variable CST by 'SET CST=-raMKPGNUINC -TGNU -d10'. 8. CFT -ra -PI960INC -TI960,KB *.c *.s CFT scans all files with extension ".c" and ".s" in the current directory. They will be preprocessed with an include file path defined in environment variable "I960INC" (-PI960INC) for compiler type "I960", 'KB' architecture (-TI960,KB). The output contains all functions (-a) with complete reference information (-r). 9. CFT -rRM -gproj40 -Gproj41 CFT reads the database named 'proj40' (-g) and produces as output the reverse function call tree (-R) with complete reference information (-r), the (include) file interdependencies (-M) and a new database named 'proj41'. 10. CST -g -Gnew -N CST reads the default database (-g) and produces as output another database named 'new' (-Gnew). No other output file is generated (-N). 11. CST -N -OTEST -O+ test.h CST reads the file "test.h", generates no output file (-N), but a byte offset calculation file for data type 'TEST' (-OTEST) and its enclosed type members (-O+). - 40 - 9 OUTPUT DESCRIPTION AND INTERPRETATION This section gives an overview about the files being generated by CFT and CST and the interpretation of the results. Different files are produced as output depending on the options being set by the user. Usually, if -N is not set, all informations are written to the default output file CFT.LST or CST.LST or to the file specified by the -o option. The internal structure of these files and their meanings are described below. If database generation is enabled with option -G, several files are produced. They all have a common database name to identify the files that are related with a project. The file extension '.DBF' marks the dBASE compatible database files, the file with the extension '.CMD' contains the command line options and the file with the extension '.SRC' contains all source files that were processed. For further informations refer to the corresponding section in the syntax description. CFT OUTPUT The output file is divided into several sections. Some of the sections listed are generated by default (-), others are optional (o) and only displayed if they are enabled by a command line option. Also, the default sections can be customised to produce the desired output. The sections generated for CFT are (in the order they appear): - file header - function calltree/called-by hierarchy listing (-r, -R, -x, -a, -m, -f, -dn, -V, -l) - function summary - multiple defined functions and their location (only if detected) - overloaded functions and their location (only if detected) o undefined functions (-u) o function call statistics (-c[s]) o function caller/member relations (-Z[s]) o function call cross reference table (-z) o critical function call path (-n[a]) o source file - include file dependency (-M) o function tables for source files (-C[s], -s, -q) - file information summary (-p, -q) Each function is displayed like: int test() (1) <DMPCA> <TEST.C, 100> with the following meanings - int : function return type - test() : function name - (1) : function reference number - 41 - - <DMPCA> : found as (one or more of) D = definition, M = macro, P = prototype, C = function call, A = assembler function - <TEST.C, 100>: file name, line number The line number is the line where the function definition block starts with its initial '{' and not the line where the function name resides. I think that this is the best solution because it is the point where we go really inside the function block. This convention is also used by source level debuggers which point on the line with the opening brace on function entry. CST OUTPUT The output file is divided into several sections. Some of the sections listed are generated by default (-), others are optional (o) and only displayed if they are enabled by a command line option. Also, the default sections can be customised to produce the desired output. The sections generated for CST are (in the order they appear): - file header - data structure calltree/called-by hierarchy listing (-r, -R, -x, -a, -m, -f, -dn) - data type summary - multiple defined data types and their location (only if detected) o data type call statistics (-c[s]) o data type caller/member relations (-Z[s]) o data type call cross reference table (-z) o maximum data type nesting (-n[a]) o source file - include file dependency (-M) o data type tables for source files (-C[s], -s, -q) - file information summary (-p, -q) Each data type is displayed like: struct _test (1) <BSUCE> <TEST.C, 90> <TEST.C, 60> with the following meanings - struct _test : type specifier - (1) : reference number - <BSUCE> : data type (one/none of): B = basic type (void, char, int, ...), S = struct, U = union, C = class, E = enum - 42 - - <TEST.C, 90> : file name, line number of type definition (only printed if necessary) - <TEST.C, 60> : file name, line number of basic type definition The two locations for the data type can occur if the data type is first defined and later assigned via 'typedef' or by '#define' (if -P is not set) to another data type name: test.c: ... line 60: struct xyz {...}; ... line 90: typedef struct xyz struct _test; ... Their definition is on different lines but both data type names refer to the same data structure. Like the convention used for functions, the line number is the line where the structure, union, enumeration or class type definition block starts with its initial '{' and not the line where the type name resides. For an example session and more detailed informations about the generated output of CFT and CST see the file EXAMPLE.DOC. OUTPUT INTERPRETATION Besides the hierarchical structure chart of the function and data type relationships, the resulting output contains several useful informations about the program which can be used for optimization, reuse or maintenance purposes. Identifying the most frequently called functions is a good way to find candidates for further optimization. Low-level functions with many callers but no called subfunctions are ideal for reuse. Functions with no callers may be useless if the function is also not called via function pointers and can be discarded therefore. The chance to find errors in complex functions with many lines of source code, many called functions and a lot of control statements is much bigger than in simple functions. - 43 - 10 INTEGRATION INTO PROGRAM DEVELOPMENT ENVIRONMENTS Invoking CFT and CST directly from inside editors or integrated programming environments (IDE) and displaying the results can be a very useful feature during program development. With advanced IDE's like that of Borland C++ or Microsoft PWB this is an easy task. The Borland IDE has in its system menu a section with 'transfer items. It contains programs that can be invoked from inside the IDE like TASM or GREP. To add CFT and CST as new entries you have to go to the OPTIONS menu and open 'TRANSFERS...'. Choose a free entry in the table and select EDIT. A window will open with 3 edit lines. In first line called 'Program Title' you must write 'C~FT' resp. 'C~ST' as the name being displayed in the transfer section. The '~' prepends the hot-keys 'F' and 'S'. In the second line called 'Program Path' you must write 'CFTIDE' resp. 'CSTIDE', maybe with the complete path, if necessary. 'CFTIDE' and 'CSTIDE' are two batch files which perform the invocation of CFT resp. CST together with the necessary options. These batch files are part of the CXT package, you can change the options defined there if you need other ones. In the third line called 'Command Line' you must write the macro commands '$EDNAME $NOSWAP $CAP EDIT'. These macros transfer the file name in the current edit window ($EDNAME) to the batch file, suppress window swapping ($NOSWAP) and capture the processing results in an own edit window ($CAP EDIT). The last step is to save these entries, then the integration is completed and CFT and CST can be used as if they were built-in functions. The processing results are shown in an edit window which can be scrolled, resized or moved. By adding CFT and CST to the IDE it is much easier for the programmer to use these tools. - 44 - 11 TOOLS FOR DATABASE PROCESSING To access informations stored in a database, the following utilities are available for the SXT programs: CFTN C Function Tree Navigator CSTN C Structure Tree Navigator DFTN DBASE Function Tree Navigator FFTN FORTRAN Function Tree Navigator LFTN LISP Function Tree Navigator They can be used to recall the file name and line number of a specific item (function or data type) from the database. If the requested item is found in the database, it will be displayed with its location where it is defined or where it is found for the first time if there was no definition found during processing. As an additional feature editors like BRIEF 3.0, QEDIT 2.1/3.0 or MicroEMACS 3.11 can be invoked directly with the informations to open the target file and to move the cursor to the line where the searched item is located. For BRIEF there are several macros available to perform searching inside the editor. A new edit window with the file at the location of the requested item will be opened if the search was successful. Also both MicroEMACS editor versions for DOS and WINDOWS are supported. Some of these actions are also possible for QEDIT, with slight limitations due to the macro programming capabilities. Other user programmable editors which should be able to work with CFTN and CSTN are EPSILON, ME, KEDIT, Codewright, Multi-Edit, JED, GNU-EMACS ports like DEMACS or OEMACS, the Microsoft editor M or integrated development environments like Borland IDE or Microsoft PWB (this list may not be complete). You can try to integrate CFTN and CSTN into these systems by using the BRIEF, QEDIT or MicroEMACS macro files as examples for your own integration development. The version numbers for the editors mentioned in this manual indicate those versions for which the described capabilities have been tested. PRECOMPILED SOURCE FILES Sometimes, if the precompile option -P was used to process the C/C++ source files related with the database, the results of searches seem to be wrong. This can happen if an identifier in the source code is in fact defined as a macro and has been exchanged during preprocessing so that the resulting source processed by the analyser is different from the original source and the cursor will point to an obviously wrong location or the search will fail. An identifier which is in fact a macro name is unknown and not accessible after precompiling. It is also possible that a function being used in the original source could not be found in the database. The reason is that the function is - 45 - in fact a 'function like' macro and was replaced during preprocessing. If different named macros are defined equal, a search for an item may point to another location than the requested. If the -P option is not set, the same item can have several 'alias'- names due to macro defining. If the source code contains explicit #line numbers, searching for a specific line may also fail. Keep these exceptions in mind for a correct interpretation the results when using the database. IMPORTANT NOTICE Recalling informations from the database may not be valid if files being processed were edited and changed after the database generation has been performed. Errors can result like pointing to wrong files and/or lines if source lines have been deleted or inserted, failed searches if names have changed or failed accesses to files which may have been renamed, moved or deleted. To avoid these errors, a consistency check for the file creation date/time and file size will be performed by the recall programs. If inconsistencies are recognised, the user will be informed that the database is not up-to-date and should be updated by processing the source files again. SYNTAX: CFTN [options] pattern CSTN [options] pattern DFTN [options] pattern FFTN [options] pattern LFTN [options] pattern OPTIONS -Eeditor Specifies the editor command line for option -e, overwrites the default and the environment values. See the section about environment variables for further informations about the required format. -F Print all file names which are related with the database. This option is useful to get a complete overview about all files of the project. -a Print all function/data type names. Useful to generate a list of items, for example as input to other programs. -B Same as -a, but prints additionally the internal database record number. Used by BRIEF macros. -bform Run search in batch-mode, this means that, if the requested item was found, the location will be displayed on a single line as "file name line number" (DEFAULT STYLE), otherwise there will be no output that the search failed. The output style can be changed - 46 - by specifying 'form' to overwrite the default style. Like for option -E you can specify the exact locations where the file name and line number should be inserted by defining a format string with %s and %d (See also the section about environment variables). For example, the format to generate a command line for invoking BRIEF, QEDIT or MicroEMACS would look like cstn -b"b -m\"goto_line %d\" %s" ... (BRIEF) cstn -b"q %s -n%d" ... (QEDIT) cstn -b"me -G%d %s" (MicroEMACS) This option gives you a great flexibility in generating an output for your own purposes, for example to write a batch file or for further use in other programs. -e If the requested item is found, an editor will be invoked to display the file containing the requested item. There are three different ways to specify the editor command line (evaluated in that order): 1) use option -E, 2) define the environment variables CFTNEDIT, CSTNEDIT or CXTNEDIT, 3) if nothing is specified, BRIEF as the default editor (if present) will be invoked with the file name and line number of the item to move the cursor to its location. Ensure that the PATH environment variable is set correctly, including the path for the BRIEF directory. -fname Use 'name' as base name (path and file name) for database files. It is also possible to use environment variables (CFTNBASE, CSTNBASE, CXTNBASE) for the definition of the database names. If -f and environment variables are not set, a DEFAULT NAME will be used (see also option -G from CFT and CST syntax description). This allows the use of different databases, for example, generated for different projects. See also the section about environment variables for further information. -r# This option prints the location for a selected item with matching pattern and record number #. This option requires -b. Used by BRIEF macros. -Ritem Print a cross reference list of every occurrence of 'item' with complete file name and line number. -Dfile Print a list with the contents of 'file'. - 47 - -o[name] Print output to file 'name'. If 'name' is not specified, DEFAULT NAMES are used: CFTN.OUT resp. CSTN.OUT. pattern The item to search for in the database. This can either be a function name (CFTN) or a data type name (CSTN). There are three different ways of searching depending how 'pattern' is given: pattern exact search, pattern* the beginning of the item must match with pattern *pattern a substring must match with pattern If the item to search for consists of more than one word (contains spaces), the search pattern must be 'quoted' like "struct _iobuf" to ensure that these words are interpreted as single pattern. RETURN VALUES The following values are returned to DOS or the calling program to report the result of the database search: - 100 searched item not found, - 101 searched item found, - 102 searched item found, but the source file may have been changed (creation date and/or file size are not equal) since the creation of the database (database is not up-to-date). The returned value can be used to decide what action should be done for different results, for example, if the database is not up-to-date. ENVIRONMENT VARIABLES CFTNEDIT, CSTNEDIT, CXTNEDIT: The editor to invoke can be defined either by option -e or by defining the environment variables CFTNEDIT (for CFTN), CSTNEDIT (for CSTN) or the commonly used variable CXTNEDIT (for both CFTN and CSTN) with the format string of the editor of your choice. The format string can be used to specify the place where the file name and the line number should be inserted to give additional informations to the editor. Use %s for the file name and %d for the line number. For example, the invocation of the default editor BRIEF could be defined like SET CFTNEDIT=b -m"goto_line %d" %s SET CSTNEDIT=b -m"goto_line %d" %s SET CXTNEDIT=b -m"goto_line %d" %s where 'b' is the BRIEF editor, '-m' specifies the macro being invoked when BRIEF starts, the macro name 'goto_line' with '%d' as the place to insert the line number and '%s' as the place for - 48 - the file name. Note that this example cannot be used on the command line with -E option because of the quotes. It is possible to change the order of %d and %s if another editor is used. Here are additional configuration examples for other popular editors (examples are given for CFTN, similar for CSTN): EDIT (MS-DOS 5.0): SET CFTNEDIT=edit %s or -E"edit %s" or SET CFTNEDIT=edit or -Eedit VDE 1.62: SET CFTNEDIT=vde %s or -E"vde %s" or SET CFTNEDIT=vde or -Evde QEDIT 2.1/3.0: SET CFTNEDIT=q %s -n%d or -E"q %s -n%d" MicroEMACS 3.11: SET CFTNEDIT=me -G%d %s or -E"me -G%d %s" The described notation allows the user to customise CFTN and CSTN with his preferred editor and to perform additional actions during invocation. If your editor supports macro programming like BRIEF you are free to write your own macros to do similar things like the CXT.CM macro given for BRIEF 3.0 does. I think this is the most flexible way to give users control about this option and to help them working with their preferred programming environment and development tools. CFTNBASE, CSTNBASE, CXTNBASE: These environment variables can be used to specify the name of the database. Similar to the editor environment variables, CFTNBASE and CSTNBASE are related to CFTN and CSTN and CXTNBASE is used for both. For example, to specify the database 'proj1' located in directory 'd:\develop\projects' type SET CFTNBASE=d:\develop\projects\proj1 SET CSTNBASE=d:\develop\projects\proj1 for a separate definition or SET CXTNBASE=d:\develop\projects\proj1 for a common definition of the database name. COMMAND LINE EXAMPLES 1) CFTN * Displays all functions in lexicographical order with their return types, file names and line numbers. Gives a short overview about all functions being found. 2) CSTN -e * Edit all data types in lexicographical order, use default or by environment variable CSTNEDIT or CXTNEDIT defined editor. - 49 - 3) CFTN -fproject1 -Evde -e main Search database named 'project1' for function 'main' and edit with editor 'vde'. 4) CSTN -b "union REGS" Search for data type 'union REGS' and display, if found, the file name and line number 5) CSTN -e -E"q %s -n%d" -fcft tmbuf Search database 'cft' for data type 'tmbuf' and invoke, if found, the editor 'q' (QEDIT 2.1/3.0) with the file name and line number SEARCHING INSIDE BRIEF (Version 3.0) This feature is one of the most powerful enhancements for the BRIEF editor and offers the user full control over the complete source code of software projects no matter how big they are and how many files they include. It extends the BRIEF editor to a comfortable hypertext source code browser and locator system. The browser allows its user to find and read various important program constructs like functions and data types in several files simultaneously and moving between them. The complete project with several source and include files appears as if it were a 'whole-part'. The browser helps the programmer to learn about the existing program structures and supports him in developing new and maintaining existing code. The programmer can use the generated output files CFT.LST or CST.LST (or the one he created with the -o option) to walk along the hierarchy tree chart and to select from there the function or data type that should be displayed in detail. The following features are implemented as macros: - searching for a specific item, tagged or marked - building menus of all defined items - building menus of all references to a specific item - building menus of all processed files - building menus of all items defined in the current file - searching for a specific item cross reference number - changing the database name Every function and data type can be accessed with just a keystroke by moving the cursor on it ("tagging") and executing a macro to locate the item and zoom into the file where it is defined. The user does no longer have to remember the file names and locations where the functions and data types are defined nor does he have to change the files, directories and drives to access the files manually. It is possible to build interactive dialog menus with all functions or data types in lexicographical order and to select an item to display. This is very useful to get a quick overview about all accessible functions and data types of the whole project. It is also possible to build an interactive dialog menu with all file names in lexicographical order which are stored in - 50 - the database and to select one file to open for edit. Other menus are available for file contents lists and item cross references. All informations to perform these actions are stored in the databases generated by processing the files related with the project. To invoke CFTN and CSTN inside BRIEF, the macro file CXT.CM must be loaded (with <F9> CXT.CM), which makes the implemented macros available. These macros are MACRO NAME KEY ASSIGNMENT (defined in CXTKEYS.CM) cft Shift F1 cftmenu Shift F2 cftxrefmenu Shift F3 cftxrefmenuagain Shift F4 cftdefmenu Shift F7 cftfilemenu Shift F8 cftfind Shift F11 cftbase Shift F12 cst Ctrl F1 cstmenu Ctrl F2 cstxrefmenu Ctrl F3 cstxrefmenuagain Ctrl F4 cstdefmenu Ctrl F7 cstfilemenu Ctrl F8 cstfind Ctrl F11 cstbase Ctrl F12 cxtbase Alt Tab cxtsearchxref Ctrl Tab cxthelp <unassigned> This macro key assignment list is also available within BRIEF as a help screen which can be invoked by the macro 'cxthelp'. The CXT help information is not part of the BRIEF help system because this would need modifications of the original BRIEF help files. Instead of loading the file CXT.CM and typing the macro names manually, you can load the macro file CXTKEYS.CM which performs automatic loading of the CXT.CM file if any of the above listed macros is invoked with a hot-key. To simplify working with this package, the CXTKEYS.CM macro file also contains key assignments for the macros. These hot-keys offer a "point and shoot" hypertext like feeling. The macro source file CXTKEYS.CB contains the source code for CXTKEYS.CM so that you are able to make changes like the key assignments for your personal needs or to move the initialization function to the BRIEF start-up macro file (For further informations about BRIEF macros see the BRIEF manuals). To load these macros and to execute CFTN and CSTN, which are invoked from inside BRIEF, be sure to set the directory path correctly. It is also necessary to allow access to the macro file DIALOG.CM which contains the functions for dialog menu building and processing. - 51 - A search can be started by simply moving the cursor on the item to search for or by marking a block with the item (necessary if search pattern contains more than one word like 'struct xyz') and then running one of the following macros (or press hot-keys): <F10> cft (function search) <F10> cst (data type search) It is also possible to type the name of the item to search for manually. To do this you must run one of the following macros: <F10> cftfind <item> (function search) <F10> cstfind <item> (data type search) If the search was successful, a new window with the file containing the item will be opened and the cursor will be placed at the line where the item is located. If inconsistencies have been detected, the user will be informed. If the requested item or the source file containing the item is not found, a message will be given. The macros for building the function and data type dialog menu are <F10> cftmenu (function menu) <F10> cstmenu (data type menu) You can scroll through the entries and select an item which should be displayed. To access databases other than the default ones, there are two ways to change the base names: 1) Set the environment variables CFTNBASE, CSTNBASE or CXTNBASE (see description above). By loading the macro file CXT.CM these variables will be used for initialization. 2) To change the base names from inside BRIEF, there are three macros to do this. They overwrite the initial values given by the environment variables: <F10> cftbase change base name for function search <F10> cstbase change base name for data type search <F10> cxtbase change both CFT and CST base name With these features it is possible to set default values for the database files or to change between different databases without leaving BRIEF which gives the user a maximum of flexibility. You can display a menu list with all source files being scanned for the database by typing <F10> cftfilemenu (CFT file menu) <F10> cstfilemenu (CST file menu) With this feature you can get a quick overview about all files related with the database. Other menu driven options concern the displaying of all cross references to a specific item (see macro 'cst' for informations about marking) with the macros - 52 - <F10> cftxrefmenu (CFT cross reference menu) <F10> cftxrefmenuagain (show previous menu again) <F10> cstxrefmenu (CST cross reference menu) <F10> cstxrefmenuagain (show previous menu again) and the displaying of a file contents list for the actual source file with the macros <F10> cftdefmenu (CFT file menu) <F10> cstdefmenu (CST file menu) To search for the first appearance of a specific cross reference number like '(123)' in a CFT or CST output listing file, move the cursor to the reference number and type <F10> cxtsearchxref (search cross reference) The macro extracts the complete number and searches for its first occurrence by starting from the beginning of the output file. With this macro you can move quickly from any reference to its initial description. All the above described macro functions are defined in the BRIEF macro file CXT.CB. These macros make extensive use of the several options of CFTN resp. CSTN, which are described earlier in detail. SEARCHING INSIDE QEDIT (2.1 and 3.0) The popular shareware editor QEDIT with its macro programming capabilities allows, like the BRIEF editor, the searching of functions and data types from inside the editor. The following examples for QEDIT macros act, with slight limitations, like the BRIEF macros 'cft' and 'cst': CFT function searching, assigned to <SHIFT F9>: #f9 MacroBegin MarkWord Copy Dos 'cftn -b ' Paste '>tmp' Return Return EditFile 'tmp' Return AltWordSet MarkWord Copy DefaultWordSet EditFile Paste Return EditFile 'tmp' Return EndLine CursorLeft MarkWord Copy Quit NextFile GotoLine Paste Return CST data type searching, assigned to <SHIFT F10>: #f10 MacroBegin MarkWord Copy Dos 'cstn -b ' Paste '>tmp' Return Return EditFile 'tmp' Return AltWordSet MarkWord Copy DefaultWordSet EditFile Paste Return EditFile 'tmp' Return EndLine CursorLeft MarkWord Copy Quit NextFile GotoLine Paste Return These QEDIT macro definitions can be placed into the 'qconfig.dat' configuration file and added to 'q.exe' with the 'qconfig.exe' configuration utility (For additional details about QEDIT macro programming see the QEDIT documentation). The two - 53 - macros perform the following actions: mark the current word, execute the CFTN or CSTN database search for the marked word via dos and redirect the output to file 'tmp', read target file name from 'tmp' and open target file, read line number from 'tmp' and go to the selected line. These macros are working almost similar to those used from BRIEF, but they have some limitations in their functionality due to the limited capabilities of the QEDIT macro programming language: - there is no error check for a correct cursor location, - the searched item must always be a single word like 'main' or 'size_t', a combined pattern like 'struct iobuf' cannot be searched, - there is no error check if the search was successful or failed or the database is not up-to-date, - if the target file is the same as that from which the search started and other additional files are also open (QEDIT ring buffer), probably a wrong file will be accessed, - the name of the database cannot be changed, the searches are performed either with the default database or those defined by the environment variables. SEARCHING INSIDE MicroEMACS (Version 3.11, DOS & WINDOWS) The latest editor which is now supported with macros for database access is MicroEMACS 3.11. The macro file is named CXT_ME.CMD and should be place in the MircoEMACS directory. This macro file works with the DOS and the WINDOWS version of MicroEMACS 3.11. The following macros are available: - cft function search for tagged item - cst data type search for tagged item - cftmark function search for marked item - cstmark data type search for marked item - cftfind function search for user defined item - cstfind data type search for user defined item - cftfile list of all CFT files - cstfile list of all CST files - cftbase set CFT database name - cstbase set CST database name - cxtbase set both CFT and CST database name They can be invoked by loading the macro file CXT_ME.CMD with ESC CTRL+S CXT_ME.CMD and running the macro with ESC CTRL+E <macro name> If the macros are used with the MicroEMACS WINDOWS version, you may have to change the DOSEXEC.PIF file, which is part of the MicroEMACS 3.11 distribution package. During the CXT macro - 54 - execution, the shell command may stop after execution and waits for the <return> key pressed to continue. To avoid this interruption, you can enable it by editing the PIF file and select "Close window after execution". The environment variables CFTNBASE, CSTNBASE and CXTNBASE are used in the same way as in the BRIEF version. Key-assignments to macro procedure names are not performed, if you prefer hot-keys, you are free to do this for yourself. In the MicroEMACS WINDOWS version, however, the user accessible macros can be integrated into the "Miscellaneous" pull-down menu (thanks to the incredible macro programming capabilities of MicroEMACS!). To view the generated output file with its semigraphic frames, change the font type and select for example the 'TERMINAL' font from the OEM font list which supports semigraphic characters. - 55 - 12 TROUBLE SHOOTING This section contains informations about problems and the several reasons which may occur during the use of SXT programs. It is strictly recommended that users should read the complete documentation to have an overview about the features before they start using CFT and CST and run into any unexpected troubles. See also the chapter about 'PROGRAM LIMITATIONS'. A PROGRAM CANNOT BE EXECUTED The program path is not specified in the environment variable PATH, the programs are not yet installed in the specified directory, attempt to start a 386 protected mode version on a 80286 (or lower) computer. EXECUTION STOPPED WITH MESSAGE "OUT OF MEMORY" An attempt to allocate memory has failed. Try to remove unnecessary memory resident TSR programs and/or use the protected mode versions if you have an 386/486. If this message happens for the protected mode versions, there is not enough free disk space for the swap file. Set the temporary directory, defined by 'TMP' resp. 'TEMP' environment variables, to another drive, if possible. WRITING THE OUTPUT FILE TAKES A LONG TIME A large number of informations must be handled, option -x or -r is not set and so the output tree chart is very large, slow CPU and/or harddisk. Use option -v to redirect intermediate files to a faster RAM-disk (if such is present). THE RESULTING OUTPUT IS DEEPLY NESTED AND EXCEEDS THE SCREEN SIZE Two reasons: Use the -r or -x option if not already specified or the source code/data types are indeed deeply nested. THE BRIEF MACROS CANNOT BE EXECUTED The macro file is not loaded, other macros with the same names or assigned keys already exist. THE BRIEF OR MICROEMACS MACROS CANNOT BE LOADED The path to the macro file location must be specified when loading the macros, if they are not in the default directory for the editor. THE BRIEF MACROS DO NOT FIND ANY FUNCTIONS OR DATA TYPES There is no access to CFTN, CSTN, DFTN (...), due to incorrect path specification, no database is present, the path to the database files is incorrect, the database name is incorrect. THE BYTE OFFSET CALCULATION FILE "CST_OFFS.C" CANNOT BE COMPILED Several reasons: Necessary data types or include files are not specified or the CST processing was done with include files other than those being used for compiling. If the number of data type informations is too large, some compilers cannot compile the large number of statements in a single file generated from CST ('out of heap space', 'code segment too large' or other messages - 56 - like that). In that case you may have to split the file into several smaller files or reduce the number of data types to display. LOCATING ITEMS IN THE BRIEF EDITOR POINTS TO WRONG PLACES Searching items from within the BRIEF editor points to wrong lines, the requested item is not present there or the file seems to be corrupted. This can have several reasons: The file is not up-to-date and has been changed since the database generation so that the line references are no longer valid. Another reason can be that the source file has explicit #line numbers as it is usual for files produced by source code generators like YACC/BISON or LEX/FLEX. A third reason may be that the source file was generated on an UNIX system and has therefore only LF instead of CR+LF as end-of-line delimiter so that BRIEF cannot display the file correctly, the file seems to be written in a single line. UNEXPECTED RESULTS WHILE RUNNING UNDER WINDOWS 3.1 The 386 versions cannot run under Windows 3.1, they are using the CPU exclusive and can therefore not co-exist with Windows, only the real mode versions can. In Windows enhanced mode (virtual 386 mode), the real mode versions cannot run simultaneously in several independent DOS-windows if they are working in the same directory or use the same temporary directory, because the temporary intermediate files may have the same names and will conflict due to multiple accesses to the same file. This may also happen if the same files are scanned. MICROEMACS FOR WINDOWS SEEMS TO HANG DURING DATABASE ACCESS AND DOES NOT RETURN The reason is usually quite simple: The shell call to DOS through DOSEXEC.PIF waits for a keystroke to continue execution and to return to WINDOWS. You may change this behaviour by editing the DOSEXEC.PIF file (see MicroEMACS section for further information). - 57 - 13 FREQUENTLY ASKED QUESTIONS ARE THERE ANY RESTRICTIONS IN THE USE OF THE ANALYSIS RESULTS? No restrictions for registered users! They can use the results for all purposes like program documentation, customer information or debugging. A notice about the name of the program is very welcome. WHY ARE THERE NO INTERACTIVE VERSIONS AVAILABLE? Interactive menu driven SAA-like programs are user friendly but need a lot of work to program them and require much memory. As the analysis tools need very much memory especially for large software projects, I have decided to satisfy at first the memory needs. The main focus is that the main work should be done on the internal analysis methods and not on the user interface layout. In a future release there may be also MS-Windows (3.1, Win32s, NT) versions with interactive user interface. An advantage of the command line versions is the possibility that they can be run from within an editor or a MAKE file. WHY ARE SEVERAL DATABASE FILES FOR EVERY PROJECT GENERATED? Separating the analysis items (identifier names, file names, relationships, ...) of one project into several closely related database files is the best way to achieve minimum storage requirements and to optimise disk usage. This way of storage has no redundancies compared to storage in a single database file. WHY IS THERE NO CROSS REFERENCE FOR VARIABLES INCLUDED? This would need much additional memory and slows down the analysis process. There would also be a lot of multiple defined names in different contexts to be managed if several files are analysed. There also exist a lot of tools which perform this task quite good. WHY ARE CFT AND CST NOT COMBINED IN ONE PROGRAM? Historical and practical reasons: the CFT development was started before CST and both programs are optimised for their own special purposes. Combining them would complicate them and slow down the analysis process. Also the memory requirements would grow. WHY DO THE NEW SXT PROGRAM PACKAGES DXT, FXT AND LXT NOT START WITH VERSION 1.00? Because they are directly derived from CXT. This means that they share a lot of common source code with CFT and CST. Every language independent feature is provided by all programs (see options). Therefore it is easier to have a similar version number for all SXT programs for maintenance and release purposes. This may change for future version. - 58 - 14 REFERENCES Brian W. Kernighan, Dennis M. Ritchie: "The C Programming Language", Prentice Hall, Englewood Cliffs, Second Edition 1988 Samuel P. Harbison, Guy L. Steele Jr.: "C: A Reference Manual", Prentice Hall, Englewood Cliffs, Third Edition 1991 Bjarne Stroustrup: "The C++ Programming Language", Addison-Wesley, Second Edition 1992 Margaret A. Ellis, Bjarne Stroustrup: "The Annotated C++ Reference Manual" (ARM), Addison-Wesley, Second Edition 1991 "Working Paper for Draft Proposed International Standard for Information Systems - Programming Language C++", AT&T, ANSI committee X3J16, ISO working group WG21, January 28, 1993 Bjarne Stroustrup, Keith Gorlen, Phil Brown, Dennis Mancl, Andrew Koenig: "UNIX System V - AT&T C++ Language System, Release 2.1 - Selected Readings", AT&T, 1989 Goldberg, A.: "Programmer as Reader", IEEE Software, September 1987 L.W. Cannon, R.A. Elliot, L.W. Kirchhoff, J.H. Miller, J.M. Milner, R.W. Mitze, E.P. Schan, N.O. Whittington, H. Spencer, D. Keppel, M. Brader: "Recommended C Style and Coding Standards", Technical Report, in the Public Domain, Revision 6.0, July 1991 (revised and updated version of the 'AT&T Indian Hill style guide', can be obtained via anonymous FTP from cs.washington.edu in '~ftp/pub/cstyle.tar.Z') A. Dolenc, A. Lemmke, D. Keppel, G.V. Reilly: "Notes on Writing Portable Programs in C", Technical Report, in the Public Domain, Revision 8, November 1990 (can be obtained via anonymous FTP from cs.washington.edu in '~ftp/pub/cport.tar.Z') M. Henricson, E. Nyquist: "Programming in C++, Rules and Recommendations", Technical Report, in the Public Domain, Ellemtel Telecommunication Systems Laboratories, Alvsjo/Sweden, Document No. M 90 0118 Uen, Rev. C (can be obtained via anonymous FTP from various sites as 'rules.ps.Z' or 'c++rules.ps.Z') Compiler reference manuals and related documentations (language references, language implementations and extensions): - Microsoft C 5.1 - Microsoft C 6.0 - Microsoft C/C++ 7.0 - Microsoft C/C++ for Windows NT (Beta Release March 1993) - Microsoft VC++ 1.0 for Windows NT (Beta Release June 1993) - Microsoft C for SCO UNIX System V Rel. 3.2 - Microsoft Macro Assembler MASM 5.1 - Borland Turbo C++ 1.0 - Borland C++ 2.0 - 59 - - Borland C++ 3.1 - Borland Turbo Assembler TASM 2.0 - Intel 80860 Metaware High C i860 APX (UNIX-hosted) - Intel 80960 C-Compiler (ic960, ec960) - Intel 80960 Assembler (asm960) - GNU-960 Tools (UNIX-hosted) - GNU-C Compiler 2.2.2 (C, C++, Objective-C) - GNU Assembler - AT&T C++ 2.1 CFRONT (C++ to C translator) for SCO UNIX System V Rel. 3.2 - IBM C-Compilers (CC, XLC) for IBM RS 6000 RISC stations, AIX 3.15 - HP C-Compilers (CC, C89) for HP Apollo 9000 RISC stations, HP-UX 9.0 - VAX C - 60 - 15 TRADEMARKS All brand or product names are trademarks (TM) or registered trademarks (R) of their respective owners. The following products and names are Copyright (C) Juergen Mueller (J.M.), all rights reserved world-wide: CXT (TM) C EXPLORATION TOOLS CFT (TM) C FUNCTION TREE GENERATOR CFTN (TM) C FUNCTION TREE NAVIGATOR CST (TM) C STRUCTURE TREE GENERATOR CSTN (TM) C STRUCTURE TREE NAVIGATOR DXT (TM) DBASE EXPLORATION TOOLS DFT (TM) DBASE FUNCTION TREE GENERATOR DFTN (TM) DBASE FUNCTION TREE NAVIGATOR FXT (TM) FORTRAN EXPLORATION TOOLS FFT (TM) FORTRAN FUNCTION TREE GENERATOR FFTN (TM) FORTRAN FUNCTION TREE NAVIGATOR LXT (TM) LISP EXPLORATION TOOLS LFT (TM) LISP FUNCTION TREE GENERATOR LFTN (TM) LISP FUNCTION TREE NAVIGATOR The packages CXT, DXT, FXT and LXT are part of SXT (TM) SOFTWARE EXPLORATION TOOLS which provide a similar set of functionalities for the source code analysis of different programming languages. See PRODUCT.DOC for a complete overview of the SXT packages and the different supported platforms. - 61 - APPENDIX 1: C-PRECOMPILER DEFINES The following list shows the precompiler defines for the supported compiler types (option -T). It contains the default defines and the optional memory model and architecture defines. Other default compiler defines which are usually declared by some of the compilers are not automatically defined by the -T option. These are defines for compilation like WINDOWS, __WINDOWS__, _Windows, DLL or __DLL__, for optimization like __OPTIMIZE__ or __FASTCALL__ or others like those about target (operating-) systems like NT, MIPS, UNIX, unix, __unix__, i386, __i386__, GNUDOS, BSD, VMS, USG, DGUX or hpux. Other sometimes predefined macros are __STRICT_ANSI__ or __CHAR_UNSIGNED__. If necessary, they can be user defined on the command line with the -D option. The macro name __cplusplus will be defined if the command line option '-C++' is set to enable C++ processing. 1. MSC51 (Microsoft C 5.1): Default defines: MSDOS, M_I86 C++ specific defines: (none) Memory model defines: M_I86SM, M_I86MM, M_I86CM, M_I86LM, M_I86HM 2. MSC70 (Microsoft C/C++ 7.0): Default defines: MSDOS, M_I86, _MSC_VER (=700) C++ specific defines: (none) Memory model defines: M_I86TM, M_I86SM, M_I86MM, M_I86CM, M_I86LM, M_I86HM 3. MSVCWNT (Microsoft VC++ 1.0 for Windows NT): Default defines: MSDOS, M_I86, _MSC_VER (=800), _M_IX86 (=300) C++ specific defines: (none) Memory model defines: (not necessary) 4. TC10 (Borland Turbo C++ 1.0): Default defines: __MSDOS__, __TURBOC__ C++ specific defines: __TCPLUSPLUS Memory model defines: __TINY__, __SMALL__, __MEDIUM__, __COMPACT_, __LARGE__, __HUGE__ 5. BC20 (Borland C++ 2.0): Default defines: __MSDOS__, __BORLANDC__ (=0x0200), __TURBOC__ (=0x0297) C++ specific defines: __BCPLUSPLUS__ (=0x0200), __TCPLUSPLUS__ (=0x0200) Memory model defines: __TINY__, __SMALL__, __MEDIUM__, __COMPACT_, __LARGE__, __HUGE__ 6. BC31 (Borland C++ 3.1): Default defines: __MSDOS__, __BORLANDC__ (=0x0410), __TURBOC__ (=0x0410) C++ specific defines: __BCPLUSPLUS__ (=0x0310), __TCPLUSPLUS__ (=0x0310) - 62 - Memory model defines: __TINY__, __SMALL__, __MEDIUM__, __COMPACT_, __LARGE__, __HUGE__ 6. BC10OS2 (Borland C++ 1.0 for OS/2): Default defines: __OS2__, __BORLANDC__ (=0x0400), __TURBOC__ (=0x0400) C++ specific defines: __BCPLUSPLUS__ (=0x0320), __TCPLUSPLUS__ (=0x0320), __TEMPLATES__ Memory model defines: (not necessary) 8. GNU (GNU C 2.2.2): Default defines: __GNUC__ (=2) C++ specific defines: __GNUG__ (=2) Memory model defines: (not necessary) 9. I960 (Intel iC960 3.0): Default defines: __i960 C++ specific defines: (none) Memory model defines: (not necessary) Architecture defines: __i960KA, __i960KB, __i960SA, __i960SB, __i960MC, __i960CA - 63 - APPENDIX 2: RESERVED C/C++ KEYWORDS The following list shows the keywords being recognised by CFT and CST, the standard C keywords, the C++ keywords and the non-standard keywords which are compiler dependent extensions to the C or C++ language. Standard C keywords are also C++ keywords, always! The C++ keywords are recognised only if option '-C++' is set, otherwise they are treated as identifiers. This list may not be complete or correct due to upcoming new releases of the supported compilers with new extensions or extensions to the language standard. C++, for which till now no 'real' language standard exists (except the de-facto standard, the AT&T CFRONT implementation), differs among several implementations, especially for the new introduced exception and template concepts (try, catch, throw, template). Undocumented but (obviously) present keywords especially in GNU C (e.g. __alignof, __classof, ...) or in Microsoft C/C++ 7.0 are ignored (even if they are listed here). KEYWORDS Standard compiler-specific extension C C++ MSC TC/BC GNU C 7.0 3.0 2.2.2 asm x auto x break x case x catch x (x) x cdecl x x char x class x classof x const x continue x default x delete x do x double x dynamic x else x enum x except x exception x extern x far x x float x for x fortran x x friend x goto x huge x x if x inline x int x interrupt x x long x near x x - 64 - new x operator x overload x x pascal x x private x protected x public x register x return x short x signed x sizeof x static x struct x switch x template x this x throw x try x (x) x typedef x typeof x union x unsigned x virtual x void x volatile x while x __alignof x __alignof__ x __asm x x __asm__ x __attribute x __attribute__ x __based x __cdecl x __classof x __classof__ x __const x x __const__ x __emit x __except x __export x __extension__ x __far x __fastcall x __finally x __fortran x __headof x __headof__ x __huge x __inline x __inline__ x __interrupt x __label__ x __loadds x __near x - 65 - __saveregs x __segment x __segname x __self x __signed x __signed__ x __stdcall x __syscall x __try x __typeof x __typeof__ x __volatile x __volatile__ x _asm x _based x _cdecl x _emit x _export x x _far x _fastcall x _fortran x _huge x _interrupt x _loadds x x _near x _pascal x _saveregs x x _seg x _segment x _segname x _self x - 66 - APPENDIX 3: EFFICIENCY To provide some values about the speed and the efficiency of the programs, tests were performed with CFT386 and CST386 (version 2.12), running on a 33 MHz 80486 with 8 MB RAM, 256 KB cache and a 15 ms hard disk (no disk cache or RAM-disk installed). The source code for the first test was the C++ part of the GNU-C compiler (version 2.2.2), which is the largest of the three compiler parts (C, C++, Objective-C). The following results have been found: - 139 files (71 source files and 68 include files) have been scanned - a total number of 2330 functions has been found from which 2248 functions were defined in the 71 source files - the directed call graph would have 2314 nodes and 10301 connections - the critical function call path has a maximum nesting level of 115 - the total size of the 139 files is 6.532 MB with 208600 lines (about 31 bytes/line), source code/filesize ratio 0.739, average function size is 1951 bytes resp. 63 lines - the effective size of the preprocessed and scanned source code (source files and their included files) is 20.775 MB with 596500 lines - the resulting output file (options -m -rauspP -TGNU -cs -Cs -n) has about 3.94 MB and 36100 lines - the resulting 6 database files have a size of 727 KB (source code/database ratio is about 9 : 1) - inside BRIEF, a database search for the location of a function is performed in less than 4 seconds - the total time for the complete processing was 31'03'' minutes with 26'30'' for analysis (includes 18'15'' for preprocessing), 2'50'' for output file writing and 1'43'' for database writing - the average analysis speed for this source code was about 783 KB/min. respectively 22510 lines/min. (The values only for source scanning without preprocessing are: 2.51 MB/min. resp. 72300 lines/min.) The CFT386 results for a large commercial project are: - 190 files (132 source files (C and assembler) and 58 include files) have been scanned - a total number of 1223 functions has been found from which 1177 functions were defined in the 132 source and in 3 include files (some include files contain inline functions) - the directed call graph would have 1223 nodes and 2366 connections - the total size of the 190 files is 6.22 MB with 145550 lines (about 42 bytes/line), source code/filesize ratio 0.533, average function size is 1805 bytes resp. 66 lines - the effective size of the preprocessed and scanned source code (source files and their included files) is 48.42 MB with 959100 lines - 67 - - the resulting output file (options -m -rauspP -cs -Cs -na) has about 907 KB and 24700 lines - the resulting 6 database files have a size of 306 KB (source code/database ratio is about 20 : 1) - the total time for the complete processing was 35'25'' minutes with 34'15'' for analysis, 0'45'' for output file writing and 0'25'' for database writing - the average analysis speed for this source code was about 1.41 MB/min. respectively 28000 lines/min. To get some efficiency values for CST386, the include files from another commercial project were analysed for data types: - 52 include files have been scanned - a total number of 605 data types have been found from which 567 structures/unions were defined in 42 of the 54 include files - the directed call graph would have 588 nodes and 1787 connections - the total size of the 52 files is 1.384 MB with 25410 lines (about 54 bytes/line), source code/filesize ratio 0.343 - the resulting output file (options -rasp -cs -Cs -n) has about 378 KB and 8740 lines - the resulting 6 database files have a size of 312 KB (source code/database ratio is about 4.4 : 1) - the total time for the complete processing was 1'10'' minutes with 0'25'' for analysis, 0'16'' for output file writing and 0'29'' for database writing - the average analysis (scanning) speed for this source code was about 3.32 MB/min. respectively 60980 lines/min (note: NO preprocessing performed, only scanning!). The calculated average values for the analysis speed differ due to the effective size of the 'really' present source code in relation to the size of the comments which can be seen by the code/filesize ratio. The speed values do not consider that, if the preprocessing option -P is set, the source code is first preprocessed to a temporary file and then analysed in a second step so that large parts of the source code are read twice (original and preprocessed code) and written once (intermediate preprocessor output). With these facts in mind, the analysis speed of CFT and CST seems to be quite acceptable! - 68 - APPENDIX 4: SYSTEM REQUIREMENTS DOS real mode versions: - IBM-AT or 100% compatible with Intel 80286 or higher, 512 KB RAM, hard disk, DOS 3.3 or higher DOS protected mode versions: - IBM-AT or 100% compatible with Intel 80386+80387 or higher, 2 MB RAM, hard-disk, DOS 3.3 or higher APPENDIX 5: INSTALLATION See INSTALL.DOC for informations. (THIS DOCUMENT HAS 69 PAGES) - 69 -